
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Improving the data load speed to Amazon redshift database
hello Team,
I have a daily job which loads ~6 million data daily(truncate reload). My target database if Amazon redshift. I have used tRedshift_output as the output component. presently the jobs takes somewhere between 4 to 5 hrs to complete. Is there a way to improve the performance of this job? Can i use any other component to improve the performance. My source is a little complex SQL ( 4 full outer join statements)
- « Previous Replies
- Next Replies »
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sushantk19 (Customer),
that's great:) So try to do unload with parallel and gzip, then do copy with gzip parameter and let me know if it helep 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sushantk19,
you will get best performance by using COPY command in tDBRow component. You can write Truncate table and copy command in one begin end transaction block. One condition - your source data have to be on S3.
Regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kamil Kowalczyk : i used the tRedshiftoutputBulk_Exec component. But i get the below error:
Exception in component tDBOutputBulkExec_1_tROB (job_DM_002_DIM_Subscription_tmp1_history)
com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential. (Service: Amazon S3; Status Code: 400; Error Code:
I am new to talend. Can you tell me how to configure the S3 folder in AWS? i guess if i use the above component then it will first copy the content to S3 folder and then write to the target table..isnt it?
Also, can you suggest the steps to setup S3? i have attached the screenshot of the tDBOutputBulkExec component.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to use s3 you have to know credentials, acces key and Secret key.
Amazon has a really great documentation. Please check this link:
https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html
I think you will find there everything you need.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kamil Kowalczyk : have you implemented this before? can you share screenshot of your S3 settings.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What exactly you want to know? Sample screenshot:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kamil Kowalczyk : i provided access key, bucket & secret key.
Key i left it as blank. Region I choose DEFAULT.
i get the below error:
Exception in component tDBOutputBulkExec_1_tROB (job_DM_003_DIM_Subscription_tmp1_history)
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection reset by peer: socket write error
what could be cause of this?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
Any inputs to my above issue? is it mandatory to provide key value??

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like some firewall blocking your traffic or maybe you don't have permissions to your s3 bucket.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
S3 Access issue has been resolved now . but i get a new error now:
Exception in component tDBOutputBulkExec_1_tRBE (job_DM_003_DIM_Subscription_tmp1_history)
java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'payments_orders_subscriptions' failed. Check 'stl_load_errors' system table for details
i checked for the ETL and it is the same as the regular flow without bulkexec component. i checked the stl_load_errors also: Error is attached.

- « Previous Replies
- Next Replies »