Skip to main content
Announcements
Global Transformation Awards submissions are open! SUBMIT YOUR STORY
cancel
Showing results for 
Search instead for 
Did you mean: 
wardzynski
Creator

INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier

Dear ladies and gentlemen,

I am having an issue with my incremental load script, and I am not 100% sure how I can solve this issue in Qlikview,

My first file nov2014 includes data from 2013-01-01 and my second file dec2014 includes partly updated data since 2013-01-01 including an additional month of dec2014 that the previous file does not have.

Some content from nov2014 file:

Year     Month     Customer     Cost Amount

2013     01          AB               100

2013     01          CD               50

2014     02          AB               90

2014     02          CD               80

UPDATED and NEW content from dec2014 file:

Year     Month     Customer     Cost Amount

2013     01          ABC             100          // Here we have a new customer name

2013     01          CD               45            // Here we have a different cost amount

2014     02          AB               90            // Same as in nov2014

2014     02          CD               80            // Same as in nov2014

2014     12          EF               70             // NEW rows

2014     12          GH               60            // NEW rows

Data file analysis:

There are NO unique row identifier or primarykey. This is basically all we have.

My question:

How can I create an INCREMENTAL LOAD in Qlikview that UPDATES rows as well as ADD new rows into a .QVD file.

I am not sure if this can be done, even if I use a composite key using [Year &'-'& Month as %YearMonthKey] when concatenating the 2 files(tables).

Adding new additional rows works great using the WHERE NOT EXISTS (%YearMonthKey), but what about updating rows?

We only have Year and Month columns that are constant, but since the Customer name can change as well, I can not use it in the Composite key.

Do any of you have any good ideas or some advice on how I could do this?

Maybe I could use a composite key that includes the Customer name: [Year &'-'& Month &'-'& Customer as %Key] and update the [Cost Amount] field first, and then use a different composite key to update the name of the customers. I am not sure!

Is it even possible to do what I want?

Appreciate all your help and advice!

Sincerely,
Daniel Wardzynski

7 Replies
jonathandienst
Partner - Champion III

Hi

The point of an incremental load is to retrieve only new and/or changed records from the database to Qlikview. Normally this is done with a monotonically increasing key or date, or a date created or last updated field in the database. Then a simple where clause in your SQL query will fetch only the relevant data from the database.

If you need to create a key and use WHERE EXISTS in Qlikview, then you are first fetching all the data from the database and then filtering it in Qlikview, which kind of negates the benefits of an incremental load.

So you will need to pass something into the SQL SELECT that allows filtering there.

HTH

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
wardzynski
Creator
Author

Thank you for your reply Jonathan.

[Where Not Exists()] is a vital part of an Incremental load (Insert and Update) scenario.

Take a look at this thread: Incremental Load in QlikView – Part2 | Learn QlikView

It seems that if I have 4 rows in my nov2014 file that share the same YEAR and MONTH as 2 rows in my dec2014 file. For instance:

nov2014 file contains:

Year     Month     Customer     Cost Amount

2013     9             AB               100

2013     9             AB               200

2013     9             AB               300

dec2014 file contains:

Year     Month     Customer     Cost Amount

2013     9             AB              999


Then the outcome is only:

Year     Month     Customer     Cost Amount

2013     9             AB              999


This happens when I am using WHERE NOT EXISTs (%YearMonthKey);

// Year &'-'& Month &'-'& as %YearMonthKey


So it basically deletes and replaces ALL OLD rows with the same year and month, with NEW rows with the same year and month, even if I have fewer rows. At the same time it adds all new rows for the new month: december 2014, that did not exist in the previous file: nov2014.


Am I on the right track folks? Is this correct?


In that case, I can use this kind of logic, and later figure out a better way to update the [Customer] names in order to concatenate different customer names into 1 name.


Anonymous
Not applicable

Hi Daniel,

The problem is apparently in the fact that the customer name can be changed.  If not for your comment, it is impossible to tell that AB and ABC is the same customer.  But you should be able to know it somehow to add this comment.  Is there something like "customer  id" that never changes?  If yes, you could've mapped to it and use in the key.

Regards,

Michael

wardzynski
Creator
Author

Hi Michael,

Unfortunately there are no unique identifiers in these files, so no unique customer number. What I am thinking is creating a change-map that maps all new and changed Customer names that are connected to a particular customer name, that way we can maintain consistency with the customer names.

What do you think about this?

What about my WHERE NOT EXISTs incremental load logics above. Is my thinking regarding the actual row updates correct that I wrote above?

Anonymous
Not applicable

The change map sounds good, in fact you're making customer name a reliable identifier,  So, the key combining Year, Month, and Customer is a way to go.

Yes, WHERE NOT EXISTS is good to use here.  Typically I create the key in the first table only, e.g.

Year & Month &Customer as %Key

And when loading the second table, use this:

WHERE NOT EXISTS (%Key, Year & Month &Customer)

wardzynski
Creator
Author

Micheal,

Thank you for your reply and clarification. With the code below I am basically replacing all the OLD rows that have the Year &'-'& Month as %Key, with NEW rows, even if there is only 1 new row, it completely replaces all OLD rows with the same %Key (deletes all the old rows and replaces with 1 new), and this is exactly what I want. Then it also adds(concatenates) NEW rows that DOES NOT exist (do not have the same %Key) since before.


Do you agree with me Micheal?

With the Customer map we discussed I should do it before I run the Incremental load, am I correct?

This is my code in 2 parts, first the initial load and then the incremental load:

-----------------------------------------------

// PART 1

// Initial Load

Facttable:

LOAD

  Year &'-'& Month as %Key,

  Year,

   Month,

   Customer,

    [Cost amount]

FROM ..2014nov.xlsx

STORE Facttable into Extract_Facts.qvd;

DROP Table facttable;

// PART 2

// Incremental Load

Facttable:

LOAD

  Year &'-'& Month as %Key,

  Year,

   Month,

   Customer,

    [Cost amount]

FROM ..2014dec.xlsx

Concatenate

Facttable:

LOAD

  %Key,

  Year,

   Month,

   Customer,

    [Cost amount]

FROM ..Extract_Facts.qvd (qvd)

WHERE NOT EXISTs (%Key);

STORE Facttable into Extract_Facts.qvd;

Anonymous
Not applicable

Close...  Assuming you have QVD with the previous data, and a file with the additional/changed data, here is what I see:
______________________________________________
CustomerChangeMap:
MAPPING LOAD DISTINCT
OldName,
NewName
FROM ..CustomerChangeMap.xlsx;

//------------------------

// loading new data, in your case Dec2014
Facttable:
LOAD
  autonumberhash128(Year & Month & Customer) as %Key,   // autonumber to keep key small
  Year,
  Month,
  Customer,
  [Cost amount]
FROM ..NewData.xlsx;

// Loading QVD, converting Customer name; load will be non-optimized because of functions
PrevData:
LOAD
  autonumberhash128(Year & Month & applymap('CustomerChangeMap',Customer)) as %QVDKey,
  Year,
  Month,
  applymap('CustomerChangeMap',Customer) as Customer     // changing Customer name
  [Cost amount]
FROM ..Extract_Facts.qvd (qvd);

// adding QVD data to facts except if the key exists in new data
CONCATENATE (Facttable) LOAD
  %QVDKey as %Key,
  Year,
  Month,
  Customer,
  [Cost amount]
RESIDENT PrevData
WHERE not exists(%Key, %QVDKey);

DROP TABLE PrevData;

STORE Facttable into Extract_Facts.qvd;

____________________________________________

Hope I didn't miss anything essential...