Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Everyone,
i am pulling all the files from S3 and loading the data inside the files into database tables.
I have another table in oracle which has all the names of the files (for auditing purpose) that i am pulling from S3 and loading into database.
Suppose someone has uploaded a new file in S3 that has no record in the database.
now I need to check if the name of files that i am pulling from S3 is already there in database or if it is a new file.
Do you know how this can be achieved?
I tried to join condition in tmap with toracle input (taking names of all records from DB)
but the links are not connecting from my previous job which is pulling files from S3.
Do you know any other method to do that?
Regards,
Mohit
Hi Mohit,
Below logic will help you to identify whether a file from S3 is a new file or not. I have created just skeleton flow and you need to expand it based on your requirement.
In above flow, we are first fetching the file name list from S3 and store it in a Hash Output component. Once all the files are stored, we are reading this Hash component using a tHashInput (don't forget to select clear cache after reading option) and then do an inner join with Oracle table. All the existing files will be present in the main flow of tmap and all the new files ill be going to reject flow of tmap (since no matching records are available in Oracle DB).
The main trick is in trowgenerator where we will be picking each file name only once (by mentioning the number of records to be generated as 1). The screenshots of main components are as below.
trowgenerator
tmap
I hope my answer has helped to clear your query. Could you please mark the topic as resolved so that it will help the Talend community? Kudos are also welcome 🙂
Warm Regards,
Nikhil Thampi
Hi Mohit,
Below logic will help you to identify whether a file from S3 is a new file or not. I have created just skeleton flow and you need to expand it based on your requirement.
In above flow, we are first fetching the file name list from S3 and store it in a Hash Output component. Once all the files are stored, we are reading this Hash component using a tHashInput (don't forget to select clear cache after reading option) and then do an inner join with Oracle table. All the existing files will be present in the main flow of tmap and all the new files ill be going to reject flow of tmap (since no matching records are available in Oracle DB).
The main trick is in trowgenerator where we will be picking each file name only once (by mentioning the number of records to be generated as 1). The screenshots of main components are as below.
trowgenerator
tmap
I hope my answer has helped to clear your query. Could you please mark the topic as resolved so that it will help the Talend community? Kudos are also welcome 🙂
Warm Regards,
Nikhil Thampi