Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everyone,
I have a question regarding to the tS3Put component of Talend.
The use-case here is that there is sensetive data in one MySQL database (and another MS SQL Server database) and the tables have to be copied to the S3 Bucket. I do not want to store locally any tables as files but stream it directly to the AWS S3 Bucket (probably in an anonymized way). As I understand the documentation of the tS3Put it should be possible to "upload data onto S3 from cache memory via the streaming mode" (Talend Documentation) which is supported since Talend Studio 7.
How do I achive that?
More information about a simple job is provided by the screenshots.
@tomtailor ,check the below link to use stream option.
https://help.talend.com/reader/og0CEvPpYA_lek9vzkTKig/FqKUw_PbNWEpKBOGkBcAeQ
Thanks @manodwhb but I already saw this site and does not help me. Its more about how to use the tS3Put component with "Stream" from a CSV-file (tFileOutputDelimited) in memory.
How can I do this?
Hey Tom,
Did you ever figure out how to configure this? I'm looking to do the same sort of thing and I haven't been able to get stream to work with the tS3Put component.
Any help or guidance would be greatly appreciated.
Best,
Ryan
Hello,
For a simple use-case you can refer to this sample job here:
https://github.com/Talend/tuj/tree/master/tuj/java/Amazon/S3/TDI40201_tS3Put_SmallStream
It basically uses:
java.io.FileInputStream stream = new java.io.FileInputStream(context.data_output_dir + "/" + jobName + "/out.csv");
globalMap.put("stream", stream);
And inside S3Put:
(java.io.FileInputStream)globalMap.get("stream")
I also did a quick test with multiple threads.
tSleep is not needed but it proves that we can flush and close the OutputStream before we start to consume it.
This relies on java PipedInput / PipedOutputStream.
We create an Output and an Input stream and link them together.
(Inspiration came from https://stackoverflow.com/questions/5778658/how-to-convert-outputstream-to-inputstream )
In a tJava we prepare these streams:
java.io.PipedInputStream in = new java.io.PipedInputStream();
final java.io.PipedOutputStream out = new java.io.PipedOutputStream(in);
globalMap.put("inputStream", in);
globalMap.put("outputStream",out);
Once this is done we can create our parallel threads via tParallellize.
FileOutputDelimited will use a stream and write to:
((java.io.PipedOutputStream)globalMap.get("outputStream"))
Once it is done with writing we need to flush the stream (tJava):
java.io.PipedOutputStream outPipe = ((java.io.PipedOutputStream)globalMap.get("outputStream"));
outPipe.flush();
outPipe.close();
Meanwhile in the tS3Put we use:
((java.io.PipedInputStream)globalMap.get("inputStream"))
I hope this helps.