Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to filter files based on their creation/modification date ?

Hi,

 

I'm trying to retrieve files from a remote folder by filtering them via their creation/modification date.

Is there a way to do it with a tFTPGet component and regexp on filemask using mtime property ?

I know there is a tFTPFileProperties but the concerned remote folder contains about 250 000 files and I just want to retrieve yesterday files, so browsing this folder can be time consuming.

 

If not, any others ways ?

 

Thank you very much !

 

Nicolas

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

first a word of warning from experience: tons of files in one dir that build every day will cause headaches. Eventually, OS tools will start to break. I've seen dir's like this where "ls" breaks. thats super annoying. I know it will be difficult to change, but re-organizing that remote folder should be something you deeply consider before the problems become more serious.

 

With that said, here's an approach that may work out for you:

first you'll need to maintain a list of file names you've already seen. This list should be updated every time you process a file. 

once you have this list, you can generate a filemask regex expression that will filter the filenames to only files you have not seen. This regex would look something like: 

^(?!(file1|file2|file3)$)

Then you should be left with a much smaller set of files that you can further filter with a tFTPFileProperties if needed. 

 

this is not an easy problem to have -- good luck!

 

 

View solution in original post

2 Replies
Anonymous
Not applicable
Author

first a word of warning from experience: tons of files in one dir that build every day will cause headaches. Eventually, OS tools will start to break. I've seen dir's like this where "ls" breaks. thats super annoying. I know it will be difficult to change, but re-organizing that remote folder should be something you deeply consider before the problems become more serious.

 

With that said, here's an approach that may work out for you:

first you'll need to maintain a list of file names you've already seen. This list should be updated every time you process a file. 

once you have this list, you can generate a filemask regex expression that will filter the filenames to only files you have not seen. This regex would look something like: 

^(?!(file1|file2|file3)$)

Then you should be left with a much smaller set of files that you can further filter with a tFTPFileProperties if needed. 

 

this is not an easy problem to have -- good luck!

 

 

Anonymous
Not applicable
Author

Hi JGM,

 

Thanks for you reply.

You're entirely right, I've re-organized that remote folder by creating a script scheduled each day just after midnight which only copy last 24 hours files in an other location. Then my talend job is retrieving and deleting files from this other location and that's it !

 

Regards,

 

Nicolas