Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
rgoldenvelodyne
Contributor
Contributor

Load from Folder Only when Modified Date is Greater than Max Date in QVD

I'm hoping your brilliant minds can assist me here. I have the tried and true ScanFolder loop in an application to generate a QVD. The issue is now one of volume and speed, so I'd like to make it incremental. I'm struggling to do this on a level that keeps Qlik from having to look at every file to determine whether or not to load it, and instead I'd like to do it at the subdir level based on the Folder's Last Modified Date.

The general idea is if the folder's (from the first level of the subdirectory) last modified date is greater than say, a date in the QVD I will be concatenating this to, then look inside and load the files. If not, skip it entirely. There are upwards of 500 folders to loop through at the top level, with thousands more below that, and probably 10s of thousands of files. It takes about 10 minutes to load just one type of file, and there are many types, meaning this app takes a solid hour to generate the QVDs necessary. 

I'm open to creative solutions, assuming they are somewhat bulletproof.

sub ScanFolder(Root)

          for each FileExtension in 'csv'
                    for each FoundFile in filelist( Root & '\Thing*.csv')
                    		tmp:
                              LOAD '$(FoundFile)' as SourceFile,
                                    FileTime() as DATE_OF_TEST,
                                         @1 as STUFF
			      From [$(FoundFile)] (txt, codepage is 28591, no labels, delimiter is ',', msq, header is 1 lines)
                                	;               
                     next FoundFile
          next FileExtension
          
          for each SubDirectory in dirlist( Root & '\*' )
                    call ScanFolder(SubDirectory)
          next SubDirectory
end sub

Call ScanFolder('lib://Drive/Place\') ;
2 Replies
jonathandienst
Partner - Champion III
Partner - Champion III

If you want to get the filetime, then the scan will be very fast. Using this script, all the system does is read the directory information. It is not necessary to open any files:

sub ScanFolder(Root)
	for each FoundFile in filelist( Root & '\Thing*.csv')
	
		tmp:
		LOAD '$(FoundFile)' as SourceFile,
			FileTime('$(FoundFile)') as DATE_OF_TEST,
		AutoGenerate 1;   	
	next FoundFile

	for each SubDirectory in dirlist( Root & '\*' )
		call ScanFolder(SubDirectory)
	next SubDirectory
end sub

Call ScanFolder('lib://Drive/Place\') ;

This should scan 1000s of files and folders in a few seconds,

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
rgoldenvelodyne
Contributor
Contributor
Author

Ok, well, that seems like a viable solution if I can then use that list to actually load data from only the files with a filetime greater than that of current QVD. What would that look like? Simply adding more fields to the existing script is exactly what I've done and the iteration takes an extremely long time.

EDIT: This script is still extremely slow. Any idea what may cause this slowdown? These are networked drives, however their performance during transfers is quite fast, normally.