- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
INCREMENTAL LOAD with UPDATE and NEW rows without a unique identifier
Dear ladies and gentlemen,
I am having an issue with my incremental load script, and I am not 100% sure how I can solve this issue in Qlikview,
My first file nov2014 includes data from 2013-01-01 and my second file dec2014 includes partly updated data since 2013-01-01 including an additional month of dec2014 that the previous file does not have.
Some content from nov2014 file:
Year Month Customer Cost Amount
2013 01 AB 100
2013 01 CD 50
2014 02 AB 90
2014 02 CD 80
UPDATED and NEW content from dec2014 file:
Year Month Customer Cost Amount
2013 01 ABC 100 // Here we have a new customer name
2013 01 CD 45 // Here we have a different cost amount
2014 02 AB 90 // Same as in nov2014
2014 02 CD 80 // Same as in nov2014
2014 12 EF 70 // NEW rows
2014 12 GH 60 // NEW rows
Data file analysis:
There are NO unique row identifier or primarykey. This is basically all we have.
My question:
How can I create an INCREMENTAL LOAD in Qlikview that UPDATES rows as well as ADD new rows into a .QVD file.
I am not sure if this can be done, even if I use a composite key using [Year &'-'& Month as %YearMonthKey] when concatenating the 2 files(tables).
Adding new additional rows works great using the WHERE NOT EXISTS (%YearMonthKey), but what about updating rows?
We only have Year and Month columns that are constant, but since the Customer name can change as well, I can not use it in the Composite key.
Do any of you have any good ideas or some advice on how I could do this?
Maybe I could use a composite key that includes the Customer name: [Year &'-'& Month &'-'& Customer as %Key] and update the [Cost Amount] field first, and then use a different composite key to update the name of the customers. I am not sure!
Is it even possible to do what I want?
Appreciate all your help and advice!
Sincerely,
Daniel Wardzynski
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
The point of an incremental load is to retrieve only new and/or changed records from the database to Qlikview. Normally this is done with a monotonically increasing key or date, or a date created or last updated field in the database. Then a simple where clause in your SQL query will fetch only the relevant data from the database.
If you need to create a key and use WHERE EXISTS in Qlikview, then you are first fetching all the data from the database and then filtering it in Qlikview, which kind of negates the benefits of an incremental load.
So you will need to pass something into the SQL SELECT that allows filtering there.
HTH
Jonathan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply Jonathan.
[Where Not Exists()] is a vital part of an Incremental load (Insert and Update) scenario.
Take a look at this thread: Incremental Load in QlikView – Part2 | Learn QlikView
It seems that if I have 4 rows in my nov2014 file that share the same YEAR and MONTH as 2 rows in my dec2014 file. For instance:
nov2014 file contains:
Year Month Customer Cost Amount
2013 9 AB 100
2013 9 AB 200
2013 9 AB 300
dec2014 file contains:
Year Month Customer Cost Amount
2013 9 AB 999
Then the outcome is only:
Year Month Customer Cost Amount
2013 9 AB 999
This happens when I am using WHERE NOT EXISTs (%YearMonthKey);
// Year &'-'& Month &'-'& as %YearMonthKey
So it basically deletes and replaces ALL OLD rows with the same year and month, with NEW rows with the same year and month, even if I have fewer rows. At the same time it adds all new rows for the new month: december 2014, that did not exist in the previous file: nov2014.
Am I on the right track folks? Is this correct?
In that case, I can use this kind of logic, and later figure out a better way to update the [Customer] names in order to concatenate different customer names into 1 name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
The problem is apparently in the fact that the customer name can be changed. If not for your comment, it is impossible to tell that AB and ABC is the same customer. But you should be able to know it somehow to add this comment. Is there something like "customer id" that never changes? If yes, you could've mapped to it and use in the key.
Regards,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
Unfortunately there are no unique identifiers in these files, so no unique customer number. What I am thinking is creating a change-map that maps all new and changed Customer names that are connected to a particular customer name, that way we can maintain consistency with the customer names.
What do you think about this?
What about my WHERE NOT EXISTs incremental load logics above. Is my thinking regarding the actual row updates correct that I wrote above?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The change map sounds good, in fact you're making customer name a reliable identifier, So, the key combining Year, Month, and Customer is a way to go.
Yes, WHERE NOT EXISTS is good to use here. Typically I create the key in the first table only, e.g.
Year & Month &Customer as %Key
And when loading the second table, use this:
WHERE NOT EXISTS (%Key, Year & Month &Customer)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Micheal,
Thank you for your reply and clarification. With the code below I am basically replacing all the OLD rows that have the Year &'-'& Month as %Key, with NEW rows, even if there is only 1 new row, it completely replaces all OLD rows with the same %Key (deletes all the old rows and replaces with 1 new), and this is exactly what I want. Then it also adds(concatenates) NEW rows that DOES NOT exist (do not have the same %Key) since before.
Do you agree with me Micheal?
With the Customer map we discussed I should do it before I run the Incremental load, am I correct?
This is my code in 2 parts, first the initial load and then the incremental load:
-----------------------------------------------
// PART 1
// Initial Load
Facttable:
LOAD
Year &'-'& Month as %Key,
Year,
Month,
Customer,
[Cost amount]
FROM ..2014nov.xlsx
STORE Facttable into Extract_Facts.qvd;
DROP Table facttable;
// PART 2
// Incremental Load
Facttable:
LOAD
Year &'-'& Month as %Key,
Year,
Month,
Customer,
[Cost amount]
FROM ..2014dec.xlsx
Concatenate
Facttable:
LOAD
%Key,
Year,
Month,
Customer,
[Cost amount]
FROM ..Extract_Facts.qvd (qvd)
WHERE NOT EXISTs (%Key);
STORE Facttable into Extract_Facts.qvd;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Close... Assuming you have QVD with the previous data, and a file with the additional/changed data, here is what I see:
______________________________________________
CustomerChangeMap:
MAPPING LOAD DISTINCT
OldName,
NewName
FROM ..CustomerChangeMap.xlsx;
//------------------------
// loading new data, in your case Dec2014
Facttable:
LOAD
autonumberhash128(Year & Month & Customer) as %Key, // autonumber to keep key small
Year,
Month,
Customer,
[Cost amount]
FROM ..NewData.xlsx;
// Loading QVD, converting Customer name; load will be non-optimized because of functions
PrevData:
LOAD
autonumberhash128(Year & Month & applymap('CustomerChangeMap',Customer)) as %QVDKey,
Year,
Month,
applymap('CustomerChangeMap',Customer) as Customer // changing Customer name
[Cost amount]
FROM ..Extract_Facts.qvd (qvd);
// adding QVD data to facts except if the key exists in new data
CONCATENATE (Facttable) LOAD
%QVDKey as %Key,
Year,
Month,
Customer,
[Cost amount]
RESIDENT PrevData
WHERE not exists(%Key, %QVDKey);
DROP TABLE PrevData;
STORE Facttable into Extract_Facts.qvd;
____________________________________________
Hope I didn't miss anything essential...