Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
sri_21
Contributor
Contributor

How to edit files in S3 bucket using talend

Hi all,

 

My current scenario is uploading a file to S3 bucket, apply some transformations once the file is available in S3 bucket and reload the file to same S3 bucket with new name. 

I'm able to upload the file to S3 bucket, but not able to read the file and apply simple transformation in the job. 

Question 1: Is it possible to edit a txt or csv file which is in S3 bucket by a talend job. 

Question 2: If yes how the job design needs to be modified. 

Please look into the screen shot of my job design.

 

Regards,

SS

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

     My current understanding is that you are currently processing a single source file which you want to copy to S3. Also you would like to do modifications on this data and store the modified file also back to S3. In this scenario, you need only tS3Put to load the data to S3 after your modifications.

 

     However, if there are multiple files from source which you would liek to move to S3, you will have to use a tFileList to do the iteration before processing each file to S3.

 

     Now, if the situation you are having is different where S3 is the source file location, then you will have to bring that file to local before making the modification using other Talend components. Once the modification is complete, you can push the file back to S3 bucket using tS3Put.

 

Warm Regards,

 

Nikhil Thampi

View solution in original post

3 Replies
Anonymous
Not applicable

Hi,

 

     Could you please remove the tS3List_1 and use the file you are passing as source of tS3Put_1 itself as the source for transformation as the next subjob. Once the transformation is complete, you can push the modified file also to S3 using tS3Put.

 

     Below is the skeleton diagram of the process.

0683p000009M0QC.png

 

 

If you are having more than one file to be processed, you can add these components to a subjob and pass the file name as parameter to the child job. Then you can call the child job in iterative fashion till all your files from source folder has been processed successfully.

 

If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome 🙂

 

Warm Regards,

 

Nikhil Thampi

sri_21
Contributor
Contributor
Author

Hi Nikhil,

The proposed solution cannot be achieved as the ts3put component will be having only the source location of the file, and the tfileinputdelimited needs to fetch the source file location as it cannot be pointed out to any local directory(as the file is in S3). To fetch the file location i need to use ts3list to read the files and get the file name to process it. 

So my question is whether talend is able to transform or cleanse some minor business requirements inside S3 bucket and move the file to S3 bucket. So that the final transformed or cleansed data can be moved to cloud DB for further processing.    

Anonymous
Not applicable

Hi,

 

     My current understanding is that you are currently processing a single source file which you want to copy to S3. Also you would like to do modifications on this data and store the modified file also back to S3. In this scenario, you need only tS3Put to load the data to S3 after your modifications.

 

     However, if there are multiple files from source which you would liek to move to S3, you will have to use a tFileList to do the iteration before processing each file to S3.

 

     Now, if the situation you are having is different where S3 is the source file location, then you will have to bring that file to local before making the modification using other Talend components. Once the modification is complete, you can push the file back to S3 bucket using tS3Put.

 

Warm Regards,

 

Nikhil Thampi