Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to use "Stream" in tS3Put component

Hello everyone,

I have a question regarding to the tS3Put component of Talend. 


The use-case here is that there is sensetive data in one MySQL database (and another MS SQL Server database) and the tables have to be copied to the S3 Bucket. I do not want to store locally any tables as files but stream it directly to the AWS S3 Bucket (probably in an anonymized way). As I understand the documentation of the tS3Put it should be possible to "upload data onto S3 from cache memory via the streaming mode" (Talend Documentation) which is supported since Talend Studio 7.

 

How do I achive that?

 

More information about a simple job is provided by the screenshots.0683p000009M4T0.png01tS3Put0683p000009M4T5.png02tS3Put0683p000009M4IN.png03tS3Put0683p000009M4TA.png04tS3Put0683p000009M4TF.png05tS3Put

Labels (1)
4 Replies
manodwhb
Champion II
Champion II

Anonymous
Not applicable
Author

Thanks @manodwhb but I already saw this site and does not help me. Its more about how to use the tS3Put component with "Stream" from a CSV-file (tFileOutputDelimited) in memory.

 

How can I do this?

ryanmcnulty
Contributor
Contributor

Hey Tom,

 

Did you ever figure out how to configure this? I'm looking to do the same sort of thing and I haven't been able to get stream to work with the tS3Put component.

 

Any help or guidance would be greatly appreciated.

 

Best,

Ryan

Anonymous
Not applicable
Author

Hello,

 

For a simple use-case you can refer to this sample job here:

https://github.com/Talend/tuj/tree/master/tuj/java/Amazon/S3/TDI40201_tS3Put_SmallStream

It basically uses:

java.io.FileInputStream stream = new java.io.FileInputStream(context.data_output_dir + "/" + jobName + "/out.csv");

globalMap.put("stream", stream);

 

And inside S3Put:

(java.io.FileInputStream)globalMap.get("stream")

 

I also did a quick test with multiple threads.

0695b00000Z2xM0AAJ.pngtSleep is not needed but it proves that we can flush and close the OutputStream before we start to consume it.

 

This relies on java PipedInput / PipedOutputStream.

We create an Output and an Input stream and link them together.

(Inspiration came from https://stackoverflow.com/questions/5778658/how-to-convert-outputstream-to-inputstream )

 

In a tJava we prepare these streams:

java.io.PipedInputStream in = new java.io.PipedInputStream();

final java.io.PipedOutputStream out = new java.io.PipedOutputStream(in);

 

globalMap.put("inputStream", in);

globalMap.put("outputStream",out);

 

Once this is done we can create our parallel threads via tParallellize.

 

FileOutputDelimited will use a stream and write to:

((java.io.PipedOutputStream)globalMap.get("outputStream"))

 

Once it is done with writing we need to flush the stream (tJava):

java.io.PipedOutputStream outPipe = ((java.io.PipedOutputStream)globalMap.get("outputStream"));

outPipe.flush();

outPipe.close();

 

Meanwhile in the tS3Put we use:

((java.io.PipedInputStream)globalMap.get("inputStream"))

 

I hope this helps.