Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] How to pick a file from S3 with latest date

Hi All,
For every 2 hours i used to get a new file in S3 and i have to take latest file depends on time from S3.
EX : My_File_20141104000001.csv
      My_File_20141104030001.csv
can some one help me how to implement this using talend.
Thanks in advance.
Rajesh
Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

This is a very common task that is not super easy to implement in Talend. 
Please have a look at my example job below and let me know if this helps you, or if I can assist further 0683p000009MACn.png
To accomplish getting the newest file, we will get a list of files then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing. 
1) tFileList: this component is configured to look for files that start with my chosen string
2) tFileProperties: this component will retrieve the properties for each file. 
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow

0683p000009MBTG.png 0683p000009MBTL.png 0683p000009MBTL.png 0683p000009MBTL.png 0683p000009MBTC.png

View solution in original post

8 Replies
Anonymous
Not applicable
Author

Hi Rajesh
Let me first ensure, if I've captured your requirements correctly:
1. Your source folder is fixed.
2. You intend to run your job every 2 hrs.
3. On every execution, you wish to pick the latest file (irrespective of its name).
Your confirmation would help formulate a solution in a better way. 0683p000009MACn.png
MathurM
Anonymous
Not applicable
Author

Hi MathurM,
Thanks for your reply
1. Your source folder is fixed.
My Source folder is fixed
2. You intend to run your job every 2 hrs.
My job has to be run for every 2 hrs
3. On every execution, you wish to pick the latest file (irrespective of its name).
Always my file name will be same i,e (My_File) and my job has to pick only the file which starts with (My_File) depends upon latest date
Thanks
Rajesh
Anonymous
Not applicable
Author

This is a very common task that is not super easy to implement in Talend. 
Please have a look at my example job below and let me know if this helps you, or if I can assist further 0683p000009MACn.png
To accomplish getting the newest file, we will get a list of files then get the properties for each of them. We will then sort the file properties by "mtime" or the last modified time and then grab the oldest for further processing. 
1) tFileList: this component is configured to look for files that start with my chosen string
2) tFileProperties: this component will retrieve the properties for each file. 
3) tBufferOutput: this component will store the file properties in memory so we can sort them once we've got info on all the files.
4) tBufferInput: this component will read from the buffer we populated with file property information
5) tSortRow: this component will sort the files by mtime descending (meaning the oldest file will be first in the list)
6) tSampleRow: this component is how we grab only the first row coming out of tSortRow

0683p000009MBTG.png 0683p000009MBTL.png 0683p000009MBTL.png 0683p000009MBTL.png 0683p000009MBTC.png
Anonymous
Not applicable
Author

Hi JohnGarrettMartin, I feel with your above solution, we kind of drifted away a bit from the original problem.
Hi Rajesh,
I would suggest you try an approach on the lines of the job shown below.
Here, 
1. We first create a start flag (assigning it a value, say 'T')
2. Using tFileList component, we iteratively extract all the files from the source folder. This component, itself allows us to sort the order of the files. We can sort the files on 'modified date', & also arrange them in 'ASC or DESC' order. In present case, we choose 'DESC.
3. Further on, we arrange to iteratively process each of the file based on a 'IF' condition i.e. the 'FLAG' equals 'T'
4. On successful processing of the file, on a 'OnSubjobOk' link we change the 'FLAG' to say 'F'.
5. As a result, after the successful processing of the first file, the flag would be changed from 'T' to 'F'. Hence, no-more fulfilling the 'IF' condition & no further files would be processed.
This way, we can achieve the processing of only the latest file in the source folder on every execution.
hope this helps. 0683p000009MACn.png
MathurM
0683p000009MBTQ.jpg
Anonymous
Not applicable
Author

Hi,
Do you have rights to move file from s3 bucket to another folder?
if yes, then once the files are processed, move it to archive folder, this is much simpler than implementing work arounds...
Vaibhav
Anonymous
Not applicable
Author

Hi 0683p000009MACn.png

Can I get assistance from this solution? I am currently working on the same issue (to picking up data from s3 bucket based on the latest file.

Tasfiahm
Creator
Creator

Hi Mathur,

 

Thanks for the advice. I know it has been five years since your post but can you please add the tJava code or screen shot of the tjava component that you have use to select the latest file.

 

Thanks,

 

T.A

NNot_defined1674577304
Contributor II
Contributor II

please share the tjava and other vital components needed to accomplish this job. Thanks