Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all!
I have a problem loading textfiles.
Once a month I get a text file delivered to a folder (file name contains date/period). I want to make a qvd file that is loaded incremental with the textfiles so I made a qvd-file as an initial file. Then I load from the initial qvd file and concatenate with the new files. Last thing I do is to store the qvd file again with the new data. The thing is that I dont want to reload old files that are residing in the same folder, that is I want to exclude all duplicates.
First I load my QVD. I only use this script once to make the QVD.
InitQVD:
Load
CustomerName
,CustomerID
,filename() as FileName
from Customer_201101.spt
(txt, codepage is 1252, embedded labels, delimiter is ';', msq);
store InitQVD into InitQVD.qvd;
My script that I run each time is this:
Customer:
Load * from InitQVD.qvd (QVD);
concatenate
load * where not exists(FileName); // I dont want to reload files that already exists in the qvd file.
load *
,filename() as FileName
from Customer_*.spt
(txt, codepage is 1252, embedded labels, delimiter is ';', msq);
store Customer into InitQVD.qvd;
The result of this is that the document loads all text files all the time so the stored qvd is storing duplicates and more...
So, the question is, what am I doing wrong?
Is this the best way of doing this? Are there better ways of handling this? Database is not an option. Should I just reload all text files each time I get a new text file?
I tried to use the ADD function in partial load but it takes forever to load and I cant confirm that the 'where not exists(FileName)' works....
All suggestions are appreciated.
Br
cristian
Christian,
I don't have much time to look into that right now, but it seems similar to what I discussed today with Badr Harfoush. Maybe it also gives you kind of kickoff, though I notice that your setting is slightly different (no unique timestamp / sequence number in filename, right?):
Hello swuehl,
Thank you for the tip. Im reading it right now. The sequence number I have on the filename would be the period? Each file has the filename as follows: Customer_201101, Customer_201102 and so on...
I downloaded the example attached by you and will try to change my code accordingly.
Br
cristian
Yes, could just be the period, should work almost right out of the box.
Somewhere in the script I used subfield('$(File)','_',5) where 5 was indicating the position of the sequence number (string parts delimited by '_'), this should be 2 in your case (two subfields, Customer and period, delimited by '_', you want the second subfield).
I like your idea using exists, but I would need to look a bit deeper into this.
But, I don't think you can use exists() on a preceding load like above effectively (if at all).
Using a for each loop as in my code, you only touch the files that you are interested in. I think this should be advantegeous over using some records based decision which data to load.
Hope this helps and have a nice weekend,
Stefan
Ok,
Now this is working very good with swuehl's example in http://community.qlik.com/thread/49224.
But, , now I would like to add another file with a similar filename as my Customer file (same column names). So the folder has two kind of files and both have a period in the file name. How should I tweak my code to load both files?
Filelist is now:
Customer_201101
Customer_201102
...
Customer_sys_201101
Customer_sys_201102
...
My code:
set vGSMAHSPATH='..\QVSource\TestFolder\';
Set vQvdFile = 'Facts.qvd';
Set vFTable = 'fTable';
Set vSourceFile = 'Customer_';
Let vQVDExists = if(FileSize('$(vQVD)$(vQvdFile)')>0,-1,0);
IF $(vQVDExists) THEN
$(vFTable):
load * from $(vQVD)$(vQvdFile) (QVD);
ENDIF
IF $(vQVDExists) THEN
maxseqTab:
LOAD max(FILEPERIOD) as maxSeq
FROM $(vQVD)$(vQvdFile) (qvd);
LET vMaxSeq = peek('maxSeq');
DROP table maxseqTab;
ELSE // QVD does not exist
LET vMaxSeq = 0; // No QVD. Force full reload
ENDIF
For each File in filelist('$(vGSMAHSPATH)$(vSourceFile)*')
IF left(right('$(File)',10),6) > $(vMaxSeq) THEN
$(vFTable):
load Distinct
Customer,
CustomerID,
FileName() as FILNAME,
left(right(FileName(),10),6) as FILEPERIOD
FROM
$(vGSMAHSPATH)$(vSourceFile)*
(txt, codepage is 1252, embedded labels, delimiter is ';', msq);
ENDIF
NEXT File
STORE $(vFTable) into $(vQVD)$(vQvdFile);
I am travelling now.
If you use something like right(...,6) in your If ... then
you should be able to handle both.
Or use a logical OR in your IF then condition.