Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
mr_novice
Creator II
Creator II

"Incremental" load of text files

Hello all!

I have a problem loading textfiles.

Once a month I get a text file delivered to a folder (file name contains date/period). I want to make a qvd file that is loaded incremental with the textfiles so I made a qvd-file as an initial file. Then I load from the initial qvd file and concatenate with the new files. Last thing I do is to store the qvd file again with the new data. The thing is that I dont want to reload old files that are residing in the same folder, that is I want to exclude all duplicates.

First I load my QVD. I only use this script once to make the QVD.

InitQVD:

Load

CustomerName

,CustomerID

,filename() as FileName

from Customer_201101.spt

(txt, codepage is 1252, embedded labels, delimiter is ';', msq);

store InitQVD into InitQVD.qvd;

My script that I run each time is this:

Customer:

Load * from InitQVD.qvd (QVD);

concatenate

load * where not exists(FileName); // I dont want to reload files that already exists in the qvd file.

load *

,filename() as FileName

from Customer_*.spt

(txt, codepage is 1252, embedded labels, delimiter is ';', msq);


store Customer into InitQVD.qvd;

The result of this is that the document loads all text files all the time so the stored qvd is storing duplicates and more...

So, the question is, what am I doing wrong?

Is this the best way of doing this? Are there better ways of handling this? Database is not an option. Should I just reload all text files each time I get a new text file?

I tried to use the ADD function in partial load but it takes forever to load and I cant confirm that the 'where not exists(FileName)' works....

All suggestions are appreciated.

Br

cristian

5 Replies
swuehl
MVP
MVP

Christian,

I don't have much time to look into that right now, but it seems similar to what I discussed today with Badr Harfoush. Maybe it also gives you kind of kickoff, though I notice that your setting is slightly different (no unique timestamp / sequence number in filename, right?):

http://community.qlik.com/thread/49224

mr_novice
Creator II
Creator II
Author

Hello swuehl,

Thank you for the tip. Im reading it right now. The sequence number I have on the filename would be the period? Each file has the filename as follows: Customer_201101, Customer_201102 and so on...

I downloaded the example attached by you and will try to change my code accordingly.

Br

cristian

swuehl
MVP
MVP

Yes, could just be the period, should work almost right out of the box.

Somewhere in the script I used subfield('$(File)','_',5) where 5 was indicating the position of the sequence number (string parts delimited by '_'), this should be 2 in your case (two subfields, Customer and period, delimited by '_', you want the second subfield).

I like your idea using exists, but I would need to look a bit deeper into this.

But, I don't think you can use exists() on a preceding load like above effectively (if at all).

Using a for each loop as in my code, you only touch the files that you are interested in. I think this should be advantegeous over using some records based decision which data to load.

Hope this helps and have a nice weekend,

Stefan

mr_novice
Creator II
Creator II
Author

Ok,

Now this is working very good with swuehl's example in http://community.qlik.com/thread/49224.

But, , now I would like to add another file with a similar filename as my Customer file (same column names). So the folder has two kind of files and both have a period in the file name. How should I tweak my code to load both files?

Filelist is now:

Customer_201101

Customer_201102

...

Customer_sys_201101

Customer_sys_201102

...

My code:

set vGSMAHSPATH='..\QVSource\TestFolder\';

Set vQvdFile = 'Facts.qvd';

Set vFTable = 'fTable';

Set vSourceFile = 'Customer_';

Let vQVDExists = if(FileSize('$(vQVD)$(vQvdFile)')>0,-1,0);

IF $(vQVDExists) THEN

          $(vFTable):

          load * from $(vQVD)$(vQvdFile) (QVD);

ENDIF

IF $(vQVDExists) THEN

          maxseqTab:

          LOAD max(FILEPERIOD) as maxSeq

          FROM $(vQVD)$(vQvdFile) (qvd);

          LET vMaxSeq = peek('maxSeq');

          DROP table maxseqTab;

 

ELSE                                                            // QVD does not exist

          LET vMaxSeq = 0;           // No QVD. Force full reload

ENDIF

For each File in filelist('$(vGSMAHSPATH)$(vSourceFile)*')

 

          IF left(right('$(File)',10),6) > $(vMaxSeq) THEN

 

          $(vFTable):

                    load Distinct

     Customer,

     CustomerID,

      FileName() as FILNAME,

               left(right(FileName(),10),6) as FILEPERIOD

                    FROM

                    $(vGSMAHSPATH)$(vSourceFile)*

                    (txt, codepage is 1252, embedded labels, delimiter is ';', msq);

          ENDIF

NEXT File

STORE $(vFTable) into $(vQVD)$(vQvdFile);

swuehl
MVP
MVP

I am travelling now.

If you use something like right(...,6) in your If ... then

you should be  able to handle both.

Or use a logical OR in your IF then condition.