Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm trying to retrieve files from a remote folder by filtering them via their creation/modification date.
Is there a way to do it with a tFTPGet component and regexp on filemask using mtime property ?
I know there is a tFTPFileProperties but the concerned remote folder contains about 250 000 files and I just want to retrieve yesterday files, so browsing this folder can be time consuming.
If not, any others ways ?
Thank you very much !
Nicolas
first a word of warning from experience: tons of files in one dir that build every day will cause headaches. Eventually, OS tools will start to break. I've seen dir's like this where "ls" breaks. thats super annoying. I know it will be difficult to change, but re-organizing that remote folder should be something you deeply consider before the problems become more serious.
With that said, here's an approach that may work out for you:
first you'll need to maintain a list of file names you've already seen. This list should be updated every time you process a file.
once you have this list, you can generate a filemask regex expression that will filter the filenames to only files you have not seen. This regex would look something like:
^(?!(file1|file2|file3)$)
Then you should be left with a much smaller set of files that you can further filter with a tFTPFileProperties if needed.
this is not an easy problem to have -- good luck!
first a word of warning from experience: tons of files in one dir that build every day will cause headaches. Eventually, OS tools will start to break. I've seen dir's like this where "ls" breaks. thats super annoying. I know it will be difficult to change, but re-organizing that remote folder should be something you deeply consider before the problems become more serious.
With that said, here's an approach that may work out for you:
first you'll need to maintain a list of file names you've already seen. This list should be updated every time you process a file.
once you have this list, you can generate a filemask regex expression that will filter the filenames to only files you have not seen. This regex would look something like:
^(?!(file1|file2|file3)$)
Then you should be left with a much smaller set of files that you can further filter with a tFTPFileProperties if needed.
this is not an easy problem to have -- good luck!
Hi JGM,
Thanks for you reply.
You're entirely right, I've re-organized that remote folder by creating a script scheduled each day just after midnight which only copy last 24 hours files in an other location. Then my talend job is retrieving and deleting files from this other location and that's it !
Regards,
Nicolas