Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

S3 to Redshift issues.

I am trying to move data from several files in S3 bucket into Redshift tables. I have tried several options for this and wanted to know the best approach that I can use for this.
Note that I can copy the files from S3 to Redshift tables without talend and it works great. However, wanted to use Talend as I will be using this for incremental loads (without dropping/clearing the tables).
I have tried the following options.
1. tS3List --> tRedshiftBulkExec    : fails on delimiter not found. Still trying to identify why this issue is occurring. Copy is working.
- tRedshiftBulkExec requires me to reenter my S3 credentials. This is kind of crazy as I already have a S3 connection and it would be great it the component can just use an existing connection.
2. tRedshiftRow    : I enter the copy command here and it works. However this is not ideal when using an ETL tool. I can as well just use basic scripting. Ideally #1 should work.
3. S3 --> tS3Get (writes to local, ugh!!) --> tFileInput (from local) --> tRedshiftOutput    (defeats the purpose of S3 to Redshift).
What is the best way to accomplish this using Talend OS for Big Data ver 6.1.1? 
Labels (2)
2 Replies
Anonymous
Not applicable
Author

tRedshiftRow is the best way.  While it is "not ideal when using an ETL tool" is it ideal for RedShift.  Running files through Talend adds a unnecessary step, and you may miss the parallelism in a direct S3 -> Redshift copy. 
Also keep in mind that a Redshift update is a delete / insert and those deleted rows will stick around until you vacuum the table.  Consider adding a tRedshiftRow for the vacuum to avoid performance degradation.
A complete process may look like:
s3 -> staging -> update -> vacuum
If you haven't already, I suggest testing whether a truncate / insert process is faster than a update as the RedShift architecture is optimized for bulk processing. 
_AnonymousUser
Specialist III
Specialist III

I am trying to move data from several files in S3 bucket into Redshift tables. I have tried several options for this and wanted to know the best approach that I can use for this.
Note that I can copy the files from S3 to Redshift tables without talend and it works great. However, wanted to use Talend as I will be using this for incremental loads (without dropping/clearing the tables).
I have tried the following options.
1. tS3List --> tRedshiftBulkExec    : fails on delimiter not found. Still trying to identify why this issue is occurring. Copy is working.
- tRedshiftBulkExec requires me to reenter my S3 credentials. This is kind of crazy as I already have a S3 connection and it would be great it the component can just use an existing connection.
2. tRedshiftRow    : I enter the copy command here and it works. However this is not ideal when using an ETL tool. I can as well just use basic scripting. Ideally #1 should work.
3. S3 --> tS3Get (writes to local, ugh!!) --> tFileInput (from local) --> tRedshiftOutput    (defeats the purpose of S3 to Redshift).
What is the best way to accomplish this using Talend OS for Big Data ver 6.1.1?