Skip to main content
Announcements
Get Ready. A New Qlik Learning Experience is Coming February 17! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
prees959
Creator II
Creator II

Loop and Load excel files in a folder in reverse order

Hi,

Every day we receive a new Excel file with the dates contained in the filename eg, DataRun_20170927_20171127.xlsx

What I need to do is loop through the folder that these files are in and load with in order of the most recent one received with a

Where Not Exists on an ID field so in effect it is bringing in the most upto date records.

Is this possible?

Thanks,

Phil

1 Solution

Accepted Solutions
marcus_malinow
Partner - Specialist III
Partner - Specialist III

Hi Phil,

something like this would work.

Marcus

--------------------------------------------

for each file in FileList('$(vFolder)DataRun*.xlsx')

    TRACE found file $(file);

    TempFileList:

    LOAD

        subfield('$(file)','\', -1) as tempFileName,

        '$(file)' as tempFullFileName,

        right(subfield(subfield('$(file)','\', -1), '.', 1), 😎 as tempFileDate

    AUTOGENERATE 1;

next file

OrderedFileList:

NOCONCATENATE

LOAD

    tempFileName as FileName,

    tempFullFileName as FullFileName,

    tempFileDate as FileDatev2

RESIDENT TempFileList

ORDER BY tempFileDate desc //retrieve most recent files first

;

DROP TABLE TempFileList;

for n = 1 to NoOfRows('OrderedFileList')

    LET vCurrentFile = peek('FileName', n-1, 'OrderedFileList');

    LET vCurrentFileDateStamp = peek('FileDatev2', n-1, 'OrderedFileList');

   

    TRACE ;

    TRACE $(vCurrentFile);

    TRACE $(vCurrentFileDateStamp);

    Data:

    LOAD

    [field list]

    FROM [$(vFolder)$(vCurrentFile)]

    WHERE NOT Exists(PrimaryKeyField)

next n

if not isnull(QvdCreateTime('$(vQVDFolder)Data.QVD')) then

    CONCATENATE (Data)

    LOAD

    [field list]

    FROM [$(vQVDFolder)Data.QVD]

    WHERE NOT Exists(PrimaryKeyField)

end if

DROP TABLE OrderedFileList;

View solution in original post

16 Replies
Frank_Hartmann
Master II
Master II

quite not sure if i really understood your requirement.

But you could try  replacing parts of the locations by a variable (vPath) in order to dynamically catch the correct daily excel file

and combining this with incremental Load e.g.

Let vPath = 'DataRun_'&date(date#(Today()),'YYYYMMDD')&'_'&date(date#(Today())+60,'YYYY')&date(date#(Today())+60,'MM')&date(date#(Today()),'DD');

//historicData:

//LOAD ID,

//     DIM

//FROM

//(qvd);

//Concatenate

actualData:

LOAD *

FROM

(ooxml, embedded labels, table is Tabelle1);

Store historicData into (qvd);

hope this helps

prees959
Creator II
Creator II
Author

Thanks for you reply.

To explain it a bit better, Every day the number of excel files grows by one which I need to loop through eg,

As of yesterday's , I had these files in a folder :

data_20170921_20171121.xlsx

data_20170922_20171122.xlsx

data_20170923_20171123.xlsx

data_20170924_201711214xlsx

data_20170925_20171125.xlsx

data_20170926_20171126.xlsx


Today a 'data_20170927_20171127.xlsx' file will be dumped into the folder...

Which I have been  loading one-by-one into a single qvd called [all_data.qvd]. I have been doing it in reverse order 26th, 25th etc to get the most upto date data by using where not exists' on a key ID field.

Is there a automatic way I can loop through the folder ensuring that I pick the latest file first before the next one and using Where Not Exists on the key ID field

Phil

marcus_malinow
Partner - Specialist III
Partner - Specialist III

Hi Phil,

something like this would work.

Marcus

--------------------------------------------

for each file in FileList('$(vFolder)DataRun*.xlsx')

    TRACE found file $(file);

    TempFileList:

    LOAD

        subfield('$(file)','\', -1) as tempFileName,

        '$(file)' as tempFullFileName,

        right(subfield(subfield('$(file)','\', -1), '.', 1), 😎 as tempFileDate

    AUTOGENERATE 1;

next file

OrderedFileList:

NOCONCATENATE

LOAD

    tempFileName as FileName,

    tempFullFileName as FullFileName,

    tempFileDate as FileDatev2

RESIDENT TempFileList

ORDER BY tempFileDate desc //retrieve most recent files first

;

DROP TABLE TempFileList;

for n = 1 to NoOfRows('OrderedFileList')

    LET vCurrentFile = peek('FileName', n-1, 'OrderedFileList');

    LET vCurrentFileDateStamp = peek('FileDatev2', n-1, 'OrderedFileList');

   

    TRACE ;

    TRACE $(vCurrentFile);

    TRACE $(vCurrentFileDateStamp);

    Data:

    LOAD

    [field list]

    FROM [$(vFolder)$(vCurrentFile)]

    WHERE NOT Exists(PrimaryKeyField)

next n

if not isnull(QvdCreateTime('$(vQVDFolder)Data.QVD')) then

    CONCATENATE (Data)

    LOAD

    [field list]

    FROM [$(vQVDFolder)Data.QVD]

    WHERE NOT Exists(PrimaryKeyField)

end if

DROP TABLE OrderedFileList;

prees959
Creator II
Creator II
Author

Hi Marcus,

Thanks for this - it seems to start well then fails on :

Data:

    LOAD

    [field list]

    FROM [$(vFolder)$(vCurrentFile)]

    WHERE NOT Exists(PrimaryKeyField)

next n

and produces the following error :

Cannot open file 'C:\QVAPPS\Data\DataRun_20170926_20171126.xlsx

    WHERE NOT EXISTS' The filename, directory name, or volume label syntax is incorrect.

Data:

    LOAD

    [field list]

    FROM

    WHERE NOT EXISTS([ID])



I have exit'ed the script just before this step and the vFolder and vCurrentFile have the correct path and filename of the most recent file.

Any ideas?

Really grateful for this!

Phil

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It looks to me like you are missing a ']' closing brace.  What does the log file show?

-Rob

prieper
Master II
Master II

Hi Phil,

think that you need a further specifyer for the excel, like

Data:

    LOAD

    ID,

     ......

    FROM

     (biff, embedded labels, table is Sheet1$)

    WHERE NOT EXISTS([ID])

marcus_malinow
Partner - Specialist III
Partner - Specialist III

Hi Phil,

my bad,

you need to add the format specification in the xlsx load. This will be sopmethign like this:

    FROM [$(vFolder)$(vCurrentFile)]

    (ooxml, embedded labels, table is 'Sheet1')

    WHERE NOT Exists(PrimaryKeyField)

I'd suggest building the xlsx part of the load script by setting up a load using the wizard, then copying into my code (substituting in variables where relevant)

Marcus

prees959
Creator II
Creator II
Author

Hi Marcus,

Much appreciated for this - I didnt spot that !  Is there a way of leaving out the ", table is 'Sheet1'" section as I have  noticed that the sheet names never have the same name ?

Many thanks!!

Phil,

marcus_malinow
Partner - Specialist III
Partner - Specialist III

Hi Phil,

yes there is a way. You'll need the Excel ODBC driver installed though.

Just prior to the Excel load, try something like this:

ODBC CONNECT TO [Excel Files;DBQ=$(vFolder)$(vCurrentFile)];

//use SQLTables to assemble a list of all the sheets in this Excel file

XLSSheetList:

SQLTables

;

   

DISCONNECT; //from excel

LET vXLSXSheet = peek('TABLE_NAME', 0, 'XLSSheetList');

DROP TABLE XLSSheetList;

Then substitute in your vXLSXSheet in your subsequent load statement

Marcus