5 Replies Latest reply: Mar 16, 2012 7:28 PM by Stefan Wühl RSS

    "Incremental" load of text files

    Cristian Ivanoff

      Hello all!

       

      I have a problem loading textfiles.

       

      Once a month I get a text file delivered to a folder (file name contains date/period). I want to make a qvd file that is loaded incremental with the textfiles so I made a qvd-file as an initial file. Then I load from the initial qvd file and concatenate with the new files. Last thing I do is to store the qvd file again with the new data. The thing is that I dont want to reload old files that are residing in the same folder, that is I want to exclude all duplicates.

       

      First I load my QVD. I only use this script once to make the QVD.

       

      InitQVD:
      Load
      CustomerName
      ,CustomerID
      ,filename() as FileName
      from Customer_201101.spt
      (txt, codepage is 1252, embedded labels, delimiter is ';', msq);
      
      store InitQVD into InitQVD.qvd;
      
      
      

       

      My script that I run each time is this:

       

      Customer:
      Load * from InitQVD.qvd (QVD);
      
      concatenate 
      load * where not exists(FileName); // I dont want to reload files that already exists in the qvd file.
      load *
      ,filename() as FileName
      from Customer_*.spt
      (txt, codepage is 1252, embedded labels, delimiter is ';', msq);
      
      
      
      
      store Customer into InitQVD.qvd;
      
      

       

      The result of this is that the document loads all text files all the time so the stored qvd is storing duplicates and more...

       

      So, the question is, what am I doing wrong?

       

      Is this the best way of doing this? Are there better ways of handling this? Database is not an option. Should I just reload all text files each time I get a new text file?

       

      I tried to use the ADD function in partial load but it takes forever to load and I cant confirm that the 'where not exists(FileName)' works....

       

      All suggestions are appreciated.

       

      Br

      cristian

        • "Incremental" load of text files
          Stefan Wühl

          Christian,

           

          I don't have much time to look into that right now, but it seems similar to what I discussed today with Badr Harfoush. Maybe it also gives you kind of kickoff, though I notice that your setting is slightly different (no unique timestamp / sequence number in filename, right?):

          http://community.qlik.com/thread/49224

            • "Incremental" load of text files
              Cristian Ivanoff

              Hello swuehl,

              Thank you for the tip. Im reading it right now. The sequence number I have on the filename would be the period? Each file has the filename as follows: Customer_201101, Customer_201102 and so on...

               

              I downloaded the example attached by you and will try to change my code accordingly.

               

              Br

              cristian

                • "Incremental" load of text files
                  Stefan Wühl

                  Yes, could just be the period, should work almost right out of the box.

                  Somewhere in the script I used subfield('$(File)','_',5) where 5 was indicating the position of the sequence number (string parts delimited by '_'), this should be 2 in your case (two subfields, Customer and period, delimited by '_', you want the second subfield).

                   

                  I like your idea using exists, but I would need to look a bit deeper into this.

                   

                  But, I don't think you can use exists() on a preceding load like above effectively (if at all).

                  Using a for each loop as in my code, you only touch the files that you are interested in. I think this should be advantegeous over using some records based decision which data to load.

                   

                  Hope this helps and have a nice weekend,

                  Stefan

              • "Incremental" load of text files
                Cristian Ivanoff

                Ok,

                Now this is working very good with swuehl's example in http://community.qlik.com/thread/49224.

                 

                But, , now I would like to add another file with a similar filename as my Customer file (same column names). So the folder has two kind of files and both have a period in the file name. How should I tweak my code to load both files?

                 

                Filelist is now:

                 

                Customer_201101

                Customer_201102

                ...

                Customer_sys_201101

                Customer_sys_201102

                ...

                 

                My code:

                 

                set vGSMAHSPATH='..\QVSource\TestFolder\';

                Set vQvdFile = 'Facts.qvd';

                Set vFTable = 'fTable';

                Set vSourceFile = 'Customer_';

                Let vQVDExists = if(FileSize('$(vQVD)$(vQvdFile)')>0,-1,0);

                 

                IF $(vQVDExists) THEN

                 

                          $(vFTable):

                          load * from $(vQVD)$(vQvdFile) (QVD);

                 

                ENDIF

                 

                IF $(vQVDExists) THEN

                 

                          maxseqTab:

                          LOAD max(FILEPERIOD) as maxSeq

                          FROM $(vQVD)$(vQvdFile) (qvd);

                 

                          LET vMaxSeq = peek('maxSeq');

                          DROP table maxseqTab;

                 

                ELSE                                                            // QVD does not exist

                          LET vMaxSeq = 0;           // No QVD. Force full reload

                ENDIF

                 

                For each File in filelist('$(vGSMAHSPATH)$(vSourceFile)*')

                 

                          IF left(right('$(File)',10),6) > $(vMaxSeq) THEN

                 

                          $(vFTable):

                                    load Distinct

                     Customer,

                     CustomerID,

                      FileName() as FILNAME,

                               left(right(FileName(),10),6) as FILEPERIOD

                                    FROM

                                    $(vGSMAHSPATH)$(vSourceFile)*

                                    (txt, codepage is 1252, embedded labels, delimiter is ';', msq);

                          ENDIF

                NEXT File

                 

                STORE $(vFTable) into $(vQVD)$(vQvdFile);