5 Replies Latest reply: Mar 16, 2012 7:28 PM by Stefan Wühl RSS

    "Incremental" load of text files

    cristian ivanoff

      Hello all!


      I have a problem loading textfiles.


      Once a month I get a text file delivered to a folder (file name contains date/period). I want to make a qvd file that is loaded incremental with the textfiles so I made a qvd-file as an initial file. Then I load from the initial qvd file and concatenate with the new files. Last thing I do is to store the qvd file again with the new data. The thing is that I dont want to reload old files that are residing in the same folder, that is I want to exclude all duplicates.


      First I load my QVD. I only use this script once to make the QVD.


      ,filename() as FileName
      from Customer_201101.spt
      (txt, codepage is 1252, embedded labels, delimiter is ';', msq);
      store InitQVD into InitQVD.qvd;


      My script that I run each time is this:


      Load * from InitQVD.qvd (QVD);
      load * where not exists(FileName); // I dont want to reload files that already exists in the qvd file.
      load *
      ,filename() as FileName
      from Customer_*.spt
      (txt, codepage is 1252, embedded labels, delimiter is ';', msq);
      store Customer into InitQVD.qvd;


      The result of this is that the document loads all text files all the time so the stored qvd is storing duplicates and more...


      So, the question is, what am I doing wrong?


      Is this the best way of doing this? Are there better ways of handling this? Database is not an option. Should I just reload all text files each time I get a new text file?


      I tried to use the ADD function in partial load but it takes forever to load and I cant confirm that the 'where not exists(FileName)' works....


      All suggestions are appreciated.




        • "Incremental" load of text files
          Stefan Wühl



          I don't have much time to look into that right now, but it seems similar to what I discussed today with Badr Harfoush. Maybe it also gives you kind of kickoff, though I notice that your setting is slightly different (no unique timestamp / sequence number in filename, right?):


            • "Incremental" load of text files
              cristian ivanoff

              Hello swuehl,

              Thank you for the tip. Im reading it right now. The sequence number I have on the filename would be the period? Each file has the filename as follows: Customer_201101, Customer_201102 and so on...


              I downloaded the example attached by you and will try to change my code accordingly.




                • "Incremental" load of text files
                  Stefan Wühl

                  Yes, could just be the period, should work almost right out of the box.

                  Somewhere in the script I used subfield('$(File)','_',5) where 5 was indicating the position of the sequence number (string parts delimited by '_'), this should be 2 in your case (two subfields, Customer and period, delimited by '_', you want the second subfield).


                  I like your idea using exists, but I would need to look a bit deeper into this.


                  But, I don't think you can use exists() on a preceding load like above effectively (if at all).

                  Using a for each loop as in my code, you only touch the files that you are interested in. I think this should be advantegeous over using some records based decision which data to load.


                  Hope this helps and have a nice weekend,


              • "Incremental" load of text files
                cristian ivanoff


                Now this is working very good with swuehl's example in http://community.qlik.com/thread/49224.


                But, , now I would like to add another file with a similar filename as my Customer file (same column names). So the folder has two kind of files and both have a period in the file name. How should I tweak my code to load both files?


                Filelist is now:









                My code:


                set vGSMAHSPATH='..\QVSource\TestFolder\';

                Set vQvdFile = 'Facts.qvd';

                Set vFTable = 'fTable';

                Set vSourceFile = 'Customer_';

                Let vQVDExists = if(FileSize('$(vQVD)$(vQvdFile)')>0,-1,0);


                IF $(vQVDExists) THEN



                          load * from $(vQVD)$(vQvdFile) (QVD);




                IF $(vQVDExists) THEN



                          LOAD max(FILEPERIOD) as maxSeq

                          FROM $(vQVD)$(vQvdFile) (qvd);


                          LET vMaxSeq = peek('maxSeq');

                          DROP table maxseqTab;


                ELSE                                                            // QVD does not exist

                          LET vMaxSeq = 0;           // No QVD. Force full reload



                For each File in filelist('$(vGSMAHSPATH)$(vSourceFile)*')


                          IF left(right('$(File)',10),6) > $(vMaxSeq) THEN



                                    load Distinct



                      FileName() as FILNAME,

                               left(right(FileName(),10),6) as FILEPERIOD



                                    (txt, codepage is 1252, embedded labels, delimiter is ';', msq);


                NEXT File


                STORE $(vFTable) into $(vQVD)$(vQvdFile);