Doing transformation for Data from S3 On the Fly

Anonymous — Mon, 14 Nov 2016 08:13:39 GMT

Hi Expert,

Right now I have a case which I need to do some transformation on the fly, or done it through the ETL PROCESS, for the data which is from S3.

Could you please give me the solution or an idea which component I should use?
Because currently I think the only possible way is, get the data into local first, and use the local flat file as the source, make the transformation on it, and put it back as a clean data in S3 as a one single file.
Anyway at the end, these data will be inserted to AWS RS.
(I think the given samples in Talend, are using tRedshift Bulk Exec, and load the whole data in a csv file from S3 to a table. In my case, I need to do some transformation first before pump it to RS.)

Thanks in advance

Re: Doing transformation for Data from S3 On the Fly

Anonymous — Mon, 14 Nov 2016 08:37:38 GMT

S3 is not a local file system. Rather it is accessible via REST, SOAP or BitTorrent (See https://en.wikipedia.org/wiki/Amazon_S3). Thus, no matter which approach you use to work with S3 you will either explicitly or implicitly have to copy the file locally, then process it, then upload it again. If the files are small enough you can deal with it in memory.
Thus for above, use a tS3Get, assuming CSV data tFileInputDelimited, then add your processing components, then you can use the tRedshiftBulkExec with the prepared file.

Thomas

topic Re: Doing transformation for Data from S3 On the Fly in Talend Studio

Doing transformation for Data from S3 On the Fly

Re: Doing transformation for Data from S3 On the Fly