4 Replies Latest reply: Nov 18, 2016 12:27 PM by Joey Lutes RSS

    Loop Through Directory Structure to load data

    Joey Lutes

      I have an elaborate load script (involving hash, flags, etc) that I need to run on every file found in a directory structure very similar to this one from csv files.

       

      In a test load, I used Henric's  hic subroutine and called it - I guess I expected it to create the table 'FileList' which would be visible in the Data Model Viewer - which it is not.  The script runs without error, I'm just not sure how to implement it.

       

      Here's the sub I used:

      Set vConcatenate = ;

      sub ScanFolder(Root)

                for each FileExtension in 'csv'

                          for each FoundFile in filelist( Root & '\*.' & FileExtension)

                                    FileList:

                                    $(vConcatenate)

                                    LOAD *, '$(FoundFile)' as SourceFile

                                    FROM [$(FoundFile)] (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines);

                                    Set vConcatenate = Concatenate;

                          next FoundFile

                next FileExtension

                for each SubDirectory in dirlist( Root & '\*' )

                          call ScanFolder(SubDirectory)

                next SubDirectory

      end sub

      Call ScanFolder('C:\datafiles') ;

       

      In its most simplistic form, I have the following directory structure:

         C:\datafiles\20161101

                file1_20161101.csv

                file2_20161101.csv

          C:\datafiles\20161102

                file1_20161102.csv

                file2_20161102.csv

       

      Once I have a filelist I need to do a few things with it:

      For each FoundFile

      1)  Extract the date.

      2)  if filedate > LastLoadedFileDate,

      3)  Load *, FileDate as FileDate

      4)  Update vLastLoadedFileDate (set command?)

      Next

       

      So the end result would be comparable to:

      For each FoundFile > vLastLoadedFileDate

      File1Table:

            Load*,

           FileDate (probably wrapped in Date())

      From: FoundFile.csv (File1_20161101.csv)

       

      Join (File1Table)

           Load*,

           FileDate

      From: FoundFile.csv (File1_20161102.csv)

       

      etc.

       

      Any assistance would be greatly appreciated!!

        • Re: Loop Through Directory Structure to load data
          Gysbert Wassenaar

          Put this somewhere at the start of the script: SET vLastLoadedFileDate = ReloadTime();

           

          And change the subroutine by adding an if statement to turn the name of the subfolder into a date and then compare it with the last reload time:

           

                    for each SubDirectory in dirlist( Root & '\*' )

                         if( date#('$(SubDirectory)','YYYYMMDD') > $(vLastLoadedFileDate) then

                              call ScanFolder(SubDirectory)

                         endif

                    next SubDirectory

            • Re: Loop Through Directory Structure to load data
              Joey Lutes

              Thanks so much!

              Would this be an additional sub or should I integrate that / add it to the existing one?  Obviously I'll need to loop through the files as well....

              Also, I'm still confused as to how this manifests - where my data actually gets loaded into a table, can you shed light on that?

                • Re: Loop Through Directory Structure to load data
                  Gysbert Wassenaar

                  In this case 'change' means 'modify', not 'remove and replace with'.

                   

                  The data from the files should get loaded by the subroutine into a table named FileList

                    • Re: Loop Through Directory Structure to load data
                      Joey Lutes

                      Ok, thank you.

                      Here is my current code (I added the SET statement earlier)

                      Set vConcatenate = ;

                      sub ScanFolder(Root)

                                for each FileExtension in 'csv'

                                          for each FoundFile in filelist( Root & '\*.' & FileExtension)

                                                    FileList:

                                                    $(vConcatenate)

                                                    LOAD *, '$(FoundFile)' as SourceFile

                                                    FROM [$(FoundFile)] (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines);

                                                    Set vConcatenate = Concatenate;

                                          next FoundFile

                                next FileExtension

                                for each SubDirectory in dirlist( Root & '\*' )

                                     if( date#('$(SubDirectory)','YYYYMMDD') > $(vLastLoadedFileDate) then

                                          call ScanFolder(SubDirectory)

                                     endif

                                next SubDirectory

                      end sub

                       

                       

                       

                      Call ScanFolder('C:\datafiles') ;

                       

                      The script does indeed run, or seems to, with no errors however no table is generated.  Weird.  Files are present.

                      And - of course I need to complicate it more  

                      The name of the table should be the name of the file preceding the _ (file1_20161101 should create table file1:)

                      I would think I could get that value from a

                      subfield(FieldBaseName,'_',1) - just not sure how to insert that as the table name.


                      OUTPUT:

                      12:17:17 TT

                      App successfully saved.

                      ---
                      Started loading data
                      ---
                      ---
                      App saved
                      ---
                      Finished successfully
                      0 forced error(s)
                      0 synthetic key(s)
                      I really appreciate the help - this sub thing is new to me and I'm a bit lost.  I'm doing my due diligence on the net to try to learn.