5 Replies Latest reply: Nov 17, 2017 3:31 AM by Marek Waszczak RSS

    load many xml files and save into QVD (incremental load)

    Marek Waszczak


      Could you please help me with the following task?


      I have many XMLs  1 file represent 1 invoice There are many of the files now (60 k ) and I am trying to move all into QVD files and update on the daily basis.


      The only problem in this task is corrected invoices which can multiplicate the same records in master data QVD. So I have to test current load with the master data and take only current records if they are repeated.


      I have wrote this script but there is some problem with the part  where not exist(Inv_ID)


      concatenate scrpt.jpg


      The problem is shown on the diagram below:

      I do apologize for bad resolution but I am in rush so I hope it is enough


      The problem is i am loosing records for each invoices which were stored in master file Bill.qvd





      In read i marked expected result in blue frame you can see actual result of the script


      This is an example and all values are fake but they are reflecting the structure of the information inside XMLs

        • Re: load many xml files and save into QVD (incremental load)
          Youssef Belloum

          first of all, did you removed the EXIT SCRIPT ??

          • Re: load many xml files and save into QVD (incremental load)
            Marek Waszczak

            Of course i did.


            I was changing this code many times and tested every part so this is why exit script is there.


            I am also running this in debug mode but it is not showing which records are added due to concatenation so I have used also logfile but it wasn't helpful as well.

            • Re: load many xml files and save into QVD (incremental load)
              Marcus Sommer

              I think your approach with a simple concatenate couldn't work because with the Inv_ID you have no unique ID per record which is necessary to apply a where not exists-clause by appending data. This meant you will need some more load-steps to detect which ID is new, changed or old. Here an example which creates this for new and changed ID's whereby if I understand it right you will only need the corrected ID's:



              load Inv_ID from Bill.qvd (qvd);



              //load Inv_ID as ID_New from Bill_tmp.qvd (qvd) where not exists(Inv_ID);



              load Inv_ID as ID_Corrected from Bill_tmp.qvd (qvd) where exists(Inv_ID);


              drop tables ID_Exists;

              rename field ID_Corrected tp Inv_ID;



              load * from Bill.qvd (qvd) where not exists(Inv_ID);


              load * from Bill_tmp.qvd (qvd);




              I'm not absolutely sure that my suggestion worked directly without some adjustments but I think it showed how a load-chain could be created to separate the needed ID's into different categories and apply them afterwards as filter (dropping and renaming from tables and fields are essential for this kind of load-logic).


              You could of course also extend the exists-function to a second parameter or applying some other type of filtering maybe with a condition per mapping but you would lose the optimized load of the qvd's (by smaller datasets it's not rellay important but by bigger one it's significant - even if you performed multiple temporary loads to get the ID's it should be faster then with unoptimized loadings).


              A load alternatively quite similar (but ?? assumingly ?? slower than the optimized loads) to the exists-approach could it be to filter the table with inner join / inner keep loadings.


              Beside this I would also consider to create an unique record-id from your source-data, maybe like:


              subfield(filebasename(), '_', 2) & '|' & rowno() as UniqueKey


              - Marcus

              • Re: load many xml files and save into QVD (incremental load)
                Rob Wunderlich

                Since Inv_ID is associated with multiple rows, you cannot use it alone in your exists().


                1. In your load of Bill, load Inv_ID twice, as "Inv_ID" and "Inv_ID_new".


                2. Change your exists clause to exists(Inv_ID_new, Inv_ID)

                3. Drop Field Inv_ID_new.

                As Marcus mentioned you will lose the optimized load, but that's the cost.