Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm hoping your brilliant minds can assist me here. I have the tried and true ScanFolder loop in an application to generate a QVD. The issue is now one of volume and speed, so I'd like to make it incremental. I'm struggling to do this on a level that keeps Qlik from having to look at every file to determine whether or not to load it, and instead I'd like to do it at the subdir level based on the Folder's Last Modified Date.
The general idea is if the folder's (from the first level of the subdirectory) last modified date is greater than say, a date in the QVD I will be concatenating this to, then look inside and load the files. If not, skip it entirely. There are upwards of 500 folders to loop through at the top level, with thousands more below that, and probably 10s of thousands of files. It takes about 10 minutes to load just one type of file, and there are many types, meaning this app takes a solid hour to generate the QVDs necessary.
I'm open to creative solutions, assuming they are somewhat bulletproof.
sub ScanFolder(Root) for each FileExtension in 'csv' for each FoundFile in filelist( Root & '\Thing*.csv') tmp: LOAD '$(FoundFile)' as SourceFile, FileTime() as DATE_OF_TEST, @1 as STUFF From [$(FoundFile)] (txt, codepage is 28591, no labels, delimiter is ',', msq, header is 1 lines) ; next FoundFile next FileExtension for each SubDirectory in dirlist( Root & '\*' ) call ScanFolder(SubDirectory) next SubDirectory end sub Call ScanFolder('lib://Drive/Place\') ;
If you want to get the filetime, then the scan will be very fast. Using this script, all the system does is read the directory information. It is not necessary to open any files:
sub ScanFolder(Root) for each FoundFile in filelist( Root & '\Thing*.csv') tmp: LOAD '$(FoundFile)' as SourceFile, FileTime('$(FoundFile)') as DATE_OF_TEST, AutoGenerate 1; next FoundFile for each SubDirectory in dirlist( Root & '\*' ) call ScanFolder(SubDirectory) next SubDirectory end sub Call ScanFolder('lib://Drive/Place\') ;
This should scan 1000s of files and folders in a few seconds,
Ok, well, that seems like a viable solution if I can then use that list to actually load data from only the files with a filetime greater than that of current QVD. What would that look like? Simply adding more fields to the existing script is exactly what I've done and the iteration takes an extremely long time.
EDIT: This script is still extremely slow. Any idea what may cause this slowdown? These are networked drives, however their performance during transfers is quite fast, normally.