Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
waszcma1
Partner - Creator II
Partner - Creator II

load many xml files and save into QVD (incremental load)

Hi,

Could you please help me with the following task?

I have many XMLs  1 file represent 1 invoice There are many of the files now (60 k ) and I am trying to move all into QVD files and update on the daily basis.

The only problem in this task is corrected invoices which can multiplicate the same records in master data QVD. So I have to test current load with the master data and take only current records if they are repeated.

I have wrote this script but there is some problem with the part  where not exist(Inv_ID)

concatenate scrpt.jpg

The problem is shown on the diagram below:

I do apologize for bad resolution but I am in rush so I hope it is enough

The problem is i am loosing records for each invoices which were stored in master file Bill.qvd

concatenate.jpg

In read i marked expected result in blue frame you can see actual result of the script

This is an example and all values are fake but they are reflecting the structure of the information inside XMLs

1 Solution

Accepted Solutions
marcus_sommer

I think your approach with a simple concatenate couldn't work because with the Inv_ID you have no unique ID per record which is necessary to apply a where not exists-clause by appending data. This meant you will need some more load-steps to detect which ID is new, changed or old. Here an example which creates this for new and changed ID's whereby if I understand it right you will only need the corrected ID's:

ID_Exists:

load Inv_ID from Bill.qvd (qvd);

//ID_New:

//load Inv_ID as ID_New from Bill_tmp.qvd (qvd) where not exists(Inv_ID);

ID_Corrected:

load Inv_ID as ID_Corrected from Bill_tmp.qvd (qvd) where exists(Inv_ID);

drop tables ID_Exists;

rename field ID_Corrected tp Inv_ID;

Bill:

load * from Bill.qvd (qvd) where not exists(Inv_ID);

concatenate(Bill)

load * from Bill_tmp.qvd (qvd);

...

I'm not absolutely sure that my suggestion worked directly without some adjustments but I think it showed how a load-chain could be created to separate the needed ID's into different categories and apply them afterwards as filter (dropping and renaming from tables and fields are essential for this kind of load-logic).

You could of course also extend the exists-function to a second parameter or applying some other type of filtering maybe with a condition per mapping but you would lose the optimized load of the qvd's (by smaller datasets it's not rellay important but by bigger one it's significant - even if you performed multiple temporary loads to get the ID's it should be faster then with unoptimized loadings).

A load alternatively quite similar (but ?? assumingly ?? slower than the optimized loads) to the exists-approach could it be to filter the table with inner join / inner keep loadings.

Beside this I would also consider to create an unique record-id from your source-data, maybe like:

subfield(filebasename(), '_', 2) & '|' & rowno() as UniqueKey

- Marcus

View solution in original post

5 Replies
YoussefBelloum
Champion
Champion

first of all, did you removed the EXIT SCRIPT ??

waszcma1
Partner - Creator II
Partner - Creator II
Author

Of course i did.

I was changing this code many times and tested every part so this is why exit script is there.

I am also running this in debug mode but it is not showing which records are added due to concatenation so I have used also logfile but it wasn't helpful as well.

marcus_sommer

I think your approach with a simple concatenate couldn't work because with the Inv_ID you have no unique ID per record which is necessary to apply a where not exists-clause by appending data. This meant you will need some more load-steps to detect which ID is new, changed or old. Here an example which creates this for new and changed ID's whereby if I understand it right you will only need the corrected ID's:

ID_Exists:

load Inv_ID from Bill.qvd (qvd);

//ID_New:

//load Inv_ID as ID_New from Bill_tmp.qvd (qvd) where not exists(Inv_ID);

ID_Corrected:

load Inv_ID as ID_Corrected from Bill_tmp.qvd (qvd) where exists(Inv_ID);

drop tables ID_Exists;

rename field ID_Corrected tp Inv_ID;

Bill:

load * from Bill.qvd (qvd) where not exists(Inv_ID);

concatenate(Bill)

load * from Bill_tmp.qvd (qvd);

...

I'm not absolutely sure that my suggestion worked directly without some adjustments but I think it showed how a load-chain could be created to separate the needed ID's into different categories and apply them afterwards as filter (dropping and renaming from tables and fields are essential for this kind of load-logic).

You could of course also extend the exists-function to a second parameter or applying some other type of filtering maybe with a condition per mapping but you would lose the optimized load of the qvd's (by smaller datasets it's not rellay important but by bigger one it's significant - even if you performed multiple temporary loads to get the ID's it should be faster then with unoptimized loadings).

A load alternatively quite similar (but ?? assumingly ?? slower than the optimized loads) to the exists-approach could it be to filter the table with inner join / inner keep loadings.

Beside this I would also consider to create an unique record-id from your source-data, maybe like:

subfield(filebasename(), '_', 2) & '|' & rowno() as UniqueKey

- Marcus

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Since Inv_ID is associated with multiple rows, you cannot use it alone in your exists().

1. In your load of Bill, load Inv_ID twice, as "Inv_ID" and "Inv_ID_new".

2. Change your exists clause to exists(Inv_ID_new, Inv_ID)


3. Drop Field Inv_ID_new.


As Marcus mentioned you will lose the optimized load, but that's the cost.


-Rob

http://masterssummit.com

http://qlikviewcookbook.com

waszcma1
Partner - Creator II
Partner - Creator II
Author

Thank you to all of you for help and explanation.

The problem wasn't so difficult as I thought and now I have better understanding of exist clause.

this what I am using now and it works 100%

XML 2 QVD.jpg