4 Replies Latest reply: Aug 23, 2011 9:51 AM by orajuvaa RSS

    Distinct load with updating data

      Hi,

       

      My dashboard is currently loading a folder full of monthly Excel reports from a network share. Each row has a numeric identifier that I want to have as unique in the final loaded data. The problem is that there can be multiple occurences of the same ID on later reports where other fields on the same row have been updated. A normal distinct load doesn't work since it only keeps the first corresponding row leaving my data partly out of date.

       

      The reports' naming convention is Report-YYMM which could maybe be used to identified the load order. The only other thing I can think of is renaming them so that they are loaded from latest to oldest.

       

      Any ideas?

        • Distinct load with updating data
          Liron Baram

          hei

          you can use filetime() which give you when the file last modified to sort the files

            • Distinct load with updating data

              Thanks for the suggestion! I'd still have to loop through all of the files in the folder to find the modified date and then load them in the sorted order. I'm not sure how to do that in the load script.

                • Re: Distinct load with updating data
                  Tanel Rüütli

                  I think the most optimal way is to load excel files looping from newest to oldest,

                  using "not exists(ID)" to avoid older duplicates.

                  Here is some script you can try:

                   

                  // get list of excel files in a directory
                  set vDirectory = 'E:\SomeDirectory';
                  
                  for each vFoundFile in filelist(vDirectory & '\*.xls')
                  
                   Files:
                   Load
                   '$(vFoundFile)' as FileName,
                   FileTime( '$(vFoundFile)' ) as Timestamp
                   AutoGenerate 1;
                  
                  next
                  
                  // Sort files by timestamp
                  SortedFiles:
                  Load
                  FileName as FileNameSorted
                  Resident Files
                  Order By Timestamp desc;
                  
                  // Load data from excel files listed in table SortedFiles
                  for i = 0 to NoOfRows('SortedFiles')-1;
                  
                   let iExcelFile = Peek('FileNameSorted', i, 'SortedFiles');
                  
                   Data:
                   Load
                   IDField, Field1, Field2
                   FROM
                   [$(iExcelFile)] (biff, no labels, table is [SheetName$])
                   Where not Exists(IDField);
                  
                  next