3 Replies Latest reply: Mar 25, 2013 12:18 PM by Curt Daughtry RSS

    how to Load only the latest files?

    Vidit Saxena

      Hi All,

       

      i need some help for the following.

      i recieve data as excel sheets on monthly basis that have name in the followong format.

      YYYYMMDD_data.xls

       

      now all these files are saved in a specific folder. While loading the data i want to load the latest 2 files. e.g. if i have the following files

       

      20130301_data.xls

      20130201_data.xls

      20130101_data.xls.

      20121201_data,xls

       

      than only the 20130301_data.xls and 20130201_data.xls should be loaded?

        • Re: how to Load only the latest files?
          Michael Solomovich

          It could be something like this (please test, I can miss/mistype):

           

          Files_src:
          LOAD FileName() as File 
          FROM [*.xlsx] (ooxml, explicit labels, table is [Sheet1]);

           

          File:
          LOAD DISTINCT
          File,
          subfield(File, '_', 1) as FileDate
          RESIDENT Files_src
          DROP TABLE Files_src;

           

          LatestFiles:
          FIRST 2
          LOAD DISTINCT
          File as LatestTwoFiles
          RESIDENT File
          ORDER BY FileDate DESC;
          DROP TABLE File;

           

          LET LatestFile1 = peek('LatestFile',0,'LatestFiles');
          LET LatestFile2 = peek('LatestFile',1,'LatestFiles');
          DROP TABLE LatestFile;

           

          //-------------------------------

          Data:
          LOAD
          ...
          RESIDENT FROM [$(LatestFile1).xlsx] (ooxml, explicit labels, table is [Sheet1]);
          LOAD
          ...
          RESIDENT FROM [$(LatestFile2).xlsx] (ooxml, explicit labels, table is [Sheet1]);

           

          Hope it helps.

           

          Regards,
          Michael

           

          Edit: In fact, I used this approach recently, except I needed only one latest file.

          • Re: how to Load only the latest files?
            Curt Daughtry

            Try this. Copy and paste into test qvw application and set folder reference accordingly. Attached in text file also for reference.

             

            set folder = "E:"    // set the folder to the desired file location - leave off everything following \filename.file-extension
            ;
            for each file in filelist('$(folder)\*.xlsx')
            FileInfo:
            Load
            Distinct
            FileName() as file_name,
            FileBaseName() as file_base_name,
            FilePath() as file_path,
            FileTime() as file_time
            from 
            $(file)
            (ooxml, embedded labels, table is Sheet1)    // change if needed for different table file format
            ;
            next file;
            Qualify *
            ;
            LatestFiles:
            First(2)  // set this number to the number of recent files that you want to load
            Load *
            resident FileInfo
            order by file_time DESC
            ;
            Drop Table FileInfo
            ;
            UNQUALIFY *
            ;
            Let source = fieldname(3, 'LatestFiles')
            ;
            Let source_i = fieldvalue('$(source)',1)
            ;
            For i = 1 to NoofRows('LatestFiles')
            [File$(i)]:
            Load *,
            '$(source_i)' as $(source)
            from 
            $(source_i)
            (ooxml, embedded labels, table is Sheet1)    // change if needed for different table file format
            ;
            Let source_i = fieldvalue('$(source)',$(i)+1);
            ;
            next i
            exit script
            ;