Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
joey_lutes
Partner - Creator
Partner - Creator

Loop Through Directory Structure to load data

I have an elaborate load script (involving hash, flags, etc) that I need to run on every file found in a directory structure very similar to this one from csv files.

In a test load, I used Henric's  hic subroutine and called it - I guess I expected it to create the table 'FileList' which would be visible in the Data Model Viewer - which it is not.  The script runs without error, I'm just not sure how to implement it.

Here's the sub I used:

Set vConcatenate = ;

sub ScanFolder(Root)

          for each FileExtension in 'csv'

                    for each FoundFile in filelist( Root & '\*.' & FileExtension)

                              FileList:

                              $(vConcatenate)

                              LOAD *, '$(FoundFile)' as SourceFile

                              FROM [$(FoundFile)] (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines);

                              Set vConcatenate = Concatenate;

                    next FoundFile

          next FileExtension

          for each SubDirectory in dirlist( Root & '\*' )

                    call ScanFolder(SubDirectory)

          next SubDirectory

end sub

Call ScanFolder('C:\datafiles') ;

In its most simplistic form, I have the following directory structure:

   C:\datafiles\20161101

          file1_20161101.csv

          file2_20161101.csv

    C:\datafiles\20161102

          file1_20161102.csv

          file2_20161102.csv

Once I have a filelist I need to do a few things with it:

For each FoundFile

1)  Extract the date.

2)  if filedate > LastLoadedFileDate,

3)  Load *, FileDate as FileDate

4)  Update vLastLoadedFileDate (set command?)

Next

So the end result would be comparable to:

For each FoundFile > vLastLoadedFileDate

File1Table:

      Load*,

     FileDate (probably wrapped in Date())

From: FoundFile.csv (File1_20161101.csv)

Join (File1Table)

     Load*,

     FileDate

From: FoundFile.csv (File1_20161102.csv)

etc.

Any assistance would be greatly appreciated!!

4 Replies
Gysbert_Wassenaar
Partner - Champion III
Partner - Champion III

Put this somewhere at the start of the script: SET vLastLoadedFileDate = ReloadTime();

And change the subroutine by adding an if statement to turn the name of the subfolder into a date and then compare it with the last reload time:

          for each SubDirectory in dirlist( Root & '\*' )

               if( date#('$(SubDirectory)','YYYYMMDD') > $(vLastLoadedFileDate) then

                    call ScanFolder(SubDirectory)

               endif

          next SubDirectory


talk is cheap, supply exceeds demand
joey_lutes
Partner - Creator
Partner - Creator
Author

Thanks so much!

Would this be an additional sub or should I integrate that / add it to the existing one?  Obviously I'll need to loop through the files as well....

Also, I'm still confused as to how this manifests - where my data actually gets loaded into a table, can you shed light on that?

Gysbert_Wassenaar
Partner - Champion III
Partner - Champion III

In this case 'change' means 'modify', not 'remove and replace with'.

The data from the files should get loaded by the subroutine into a table named FileList


talk is cheap, supply exceeds demand
joey_lutes
Partner - Creator
Partner - Creator
Author

Ok, thank you.

Here is my current code (I added the SET statement earlier)

Set vConcatenate = ;

sub ScanFolder(Root)

          for each FileExtension in 'csv'

                    for each FoundFile in filelist( Root & '\*.' & FileExtension)

                              FileList:

                              $(vConcatenate)

                              LOAD *, '$(FoundFile)' as SourceFile

                              FROM [$(FoundFile)] (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines);

                              Set vConcatenate = Concatenate;

                    next FoundFile

          next FileExtension

          for each SubDirectory in dirlist( Root & '\*' )

               if( date#('$(SubDirectory)','YYYYMMDD') > $(vLastLoadedFileDate) then

                    call ScanFolder(SubDirectory)

               endif

          next SubDirectory

end sub

Call ScanFolder('C:\datafiles') ;

The script does indeed run, or seems to, with no errors however no table is generated.  Weird.  Files are present.

And - of course I need to complicate it more  

The name of the table should be the name of the file preceding the _ (file1_20161101 should create table file1:)

I would think I could get that value from a

subfield(FieldBaseName,'_',1) - just not sure how to insert that as the table name.


OUTPUT:

12:17:17 TT

App successfully saved.

---
Started loading data
---
---
App saved
---
Finished successfully
0 forced error(s)
0 synthetic key(s)
I really appreciate the help - this sub thing is new to me and I'm a bit lost.  I'm doing my due diligence on the net to try to learn.