Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Every day we receive a new Excel file with the dates contained in the filename eg, DataRun_20170927_20171127.xlsx
What I need to do is loop through the folder that these files are in and load with in order of the most recent one received with a
Where Not Exists on an ID field so in effect it is bringing in the most upto date records.
Is this possible?
Thanks,
Phil
Hi Phil,
something like this would work.
Marcus
--------------------------------------------
for each file in FileList('$(vFolder)DataRun*.xlsx')
TRACE found file $(file);
TempFileList:
LOAD
subfield('$(file)','\', -1) as tempFileName,
'$(file)' as tempFullFileName,
right(subfield(subfield('$(file)','\', -1), '.', 1), 😎 as tempFileDate
AUTOGENERATE 1;
next file
OrderedFileList:
NOCONCATENATE
LOAD
tempFileName as FileName,
tempFullFileName as FullFileName,
tempFileDate as FileDatev2
RESIDENT TempFileList
ORDER BY tempFileDate desc //retrieve most recent files first
;
DROP TABLE TempFileList;
for n = 1 to NoOfRows('OrderedFileList')
LET vCurrentFile = peek('FileName', n-1, 'OrderedFileList');
LET vCurrentFileDateStamp = peek('FileDatev2', n-1, 'OrderedFileList');
TRACE ;
TRACE $(vCurrentFile);
TRACE $(vCurrentFileDateStamp);
Data:
LOAD
[field list]
FROM [$(vFolder)$(vCurrentFile)]
WHERE NOT Exists(PrimaryKeyField)
next n
if not isnull(QvdCreateTime('$(vQVDFolder)Data.QVD')) then
CONCATENATE (Data)
LOAD
[field list]
FROM [$(vQVDFolder)Data.QVD]
WHERE NOT Exists(PrimaryKeyField)
end if
DROP TABLE OrderedFileList;
quite not sure if i really understood your requirement.
But you could try replacing parts of the locations by a variable (vPath) in order to dynamically catch the correct daily excel file
and combining this with incremental Load e.g.
Let vPath = 'DataRun_'&date(date#(Today()),'YYYYMMDD')&'_'&date(date#(Today())+60,'YYYY')&date(date#(Today())+60,'MM')&date(date#(Today()),'DD');
//historicData:
//LOAD ID,
// DIM
//FROM
//
(qvd); //Concatenate
actualData:
LOAD *
FROM
(ooxml, embedded labels, table is Tabelle1);
Store historicData into
(qvd);
hope this helps
Thanks for you reply.
To explain it a bit better, Every day the number of excel files grows by one which I need to loop through eg,
As of yesterday's , I had these files in a folder :
data_20170921_20171121.xlsx
data_20170922_20171122.xlsx
data_20170923_20171123.xlsx
data_20170924_201711214xlsx
data_20170925_20171125.xlsx
data_20170926_20171126.xlsx
Today a 'data_20170927_20171127.xlsx' file will be dumped into the folder...
Which I have been loading one-by-one into a single qvd called [all_data.qvd]. I have been doing it in reverse order 26th, 25th etc to get the most upto date data by using where not exists' on a key ID field.
Is there a automatic way I can loop through the folder ensuring that I pick the latest file first before the next one and using Where Not Exists on the key ID field
Phil
Hi Phil,
something like this would work.
Marcus
--------------------------------------------
for each file in FileList('$(vFolder)DataRun*.xlsx')
TRACE found file $(file);
TempFileList:
LOAD
subfield('$(file)','\', -1) as tempFileName,
'$(file)' as tempFullFileName,
right(subfield(subfield('$(file)','\', -1), '.', 1), 😎 as tempFileDate
AUTOGENERATE 1;
next file
OrderedFileList:
NOCONCATENATE
LOAD
tempFileName as FileName,
tempFullFileName as FullFileName,
tempFileDate as FileDatev2
RESIDENT TempFileList
ORDER BY tempFileDate desc //retrieve most recent files first
;
DROP TABLE TempFileList;
for n = 1 to NoOfRows('OrderedFileList')
LET vCurrentFile = peek('FileName', n-1, 'OrderedFileList');
LET vCurrentFileDateStamp = peek('FileDatev2', n-1, 'OrderedFileList');
TRACE ;
TRACE $(vCurrentFile);
TRACE $(vCurrentFileDateStamp);
Data:
LOAD
[field list]
FROM [$(vFolder)$(vCurrentFile)]
WHERE NOT Exists(PrimaryKeyField)
next n
if not isnull(QvdCreateTime('$(vQVDFolder)Data.QVD')) then
CONCATENATE (Data)
LOAD
[field list]
FROM [$(vQVDFolder)Data.QVD]
WHERE NOT Exists(PrimaryKeyField)
end if
DROP TABLE OrderedFileList;
Hi Marcus,
Thanks for this - it seems to start well then fails on :
Data:
LOAD
[field list]
FROM [$(vFolder)$(vCurrentFile)]
WHERE NOT Exists(PrimaryKeyField)
next n
and produces the following error :
Cannot open file 'C:\QVAPPS\Data\DataRun_20170926_20171126.xlsx
WHERE NOT EXISTS' The filename, directory name, or volume label syntax is incorrect.
Data:
LOAD
[field list]
FROM
WHERE NOT EXISTS([ID])
I have exit'ed the script just before this step and the vFolder and vCurrentFile have the correct path and filename of the most recent file.
Any ideas?
Really grateful for this!
Phil
It looks to me like you are missing a ']' closing brace. What does the log file show?
-Rob
Hi Phil,
think that you need a further specifyer for the excel, like
Data:
LOAD
ID,
......
FROM
(biff, embedded labels, table is Sheet1$)
WHERE NOT EXISTS([ID])
Hi Phil,
my bad,
you need to add the format specification in the xlsx load. This will be sopmethign like this:
FROM [$(vFolder)$(vCurrentFile)]
(ooxml, embedded labels, table is 'Sheet1')
WHERE NOT Exists(PrimaryKeyField)
I'd suggest building the xlsx part of the load script by setting up a load using the wizard, then copying into my code (substituting in variables where relevant)
Marcus
Hi Marcus,
Much appreciated for this - I didnt spot that ! Is there a way of leaving out the ", table is 'Sheet1'" section as I have noticed that the sheet names never have the same name ?
Many thanks!!
Phil,
Hi Phil,
yes there is a way. You'll need the Excel ODBC driver installed though.
Just prior to the Excel load, try something like this:
ODBC CONNECT TO [Excel Files;DBQ=$(vFolder)$(vCurrentFile)];
//use SQLTables to assemble a list of all the sheets in this Excel file
XLSSheetList:
SQLTables
;
DISCONNECT; //from excel
LET vXLSXSheet = peek('TABLE_NAME', 0, 'XLSSheetList');
DROP TABLE XLSSheetList;
Then substitute in your vXLSXSheet in your subsequent load statement
Marcus