Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
sushantk19
Creator

Improving the data load speed to Amazon redshift database

hello Team,

I have a daily job which loads ~6 million data daily(truncate reload). My target database if Amazon redshift. I have used tRedshift_output as the output component. presently the jobs takes somewhere between 4 to 5 hrs to complete. Is there a way to improve the performance of this job? Can i use any other component to improve the performance. My source is a little complex SQL ( 4 full outer join statements)

1 Solution

Accepted Solutions
pakapi_
Creator

Hi sushantk19 (Customer),

 

that's great:) So try to do unload with parallel and gzip, then do copy with gzip parameter and let me know if it helep 🙂

 

 

View solution in original post

20 Replies
pakapi_
Creator

Hi sushantk19,

 

you will get best performance by using COPY command in tDBRow component. You can write Truncate table and copy command in one begin end transaction block. One condition - your source data have to be on S3.

 

Regards

sushantk19
Creator
Author

@Kamil Kowalczyk​ : i used the tRedshiftoutputBulk_Exec component. But i get the below error:

 

Exception in component tDBOutputBulkExec_1_tROB (job_DM_002_DIM_Subscription_tmp1_history)

com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential. (Service: Amazon S3; Status Code: 400; Error Code: 

 

I am new to talend. Can you tell me how to configure the S3 folder in AWS? i guess if i use the above component then it will first copy the content to S3 folder and then write to the target table..isnt it?

 

Also, can you suggest the steps to setup S3? i have attached the screenshot of the tDBOutputBulkExec component.

pakapi_
Creator

If you want to use s3 you have to know credentials, acces key and Secret key.

​Amazon has a really great documentation. Please check this link:

https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html

I think you will find there everything you need.

sushantk19
Creator
Author

@Kamil Kowalczyk​ : have you implemented this before? can you share screenshot of your S3 settings.

 

pakapi_
Creator

What exactly you want to know? Sample screenshot:

0693p000009rUvsAAE.png

 

sushantk19
Creator
Author

@Kamil Kowalczyk​ : i provided access key, bucket & secret key.

Key i left it as blank. Region I choose DEFAULT.

 

i get the below error:

 

Exception in component tDBOutputBulkExec_1_tROB (job_DM_003_DIM_Subscription_tmp1_history)

com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection reset by peer: socket write error

 

what could be cause of this?

 

 

 

 

 

sushantk19
Creator
Author

Hello All,

 

Any inputs to my above issue? is it mandatory to provide key value??

 

 

pakapi_
Creator

Looks like some firewall blocking your traffic or maybe you don't have permissions to your s3 bucket.

sushantk19
Creator
Author

S3 Access issue has been resolved now . but i get a new error now:

Exception in component tDBOutputBulkExec_1_tRBE (job_DM_003_DIM_Subscription_tmp1_history)

java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'payments_orders_subscriptions' failed. Check 'stl_load_errors' system table for details

 

i checked for the ETL and it is the same as the regular flow without bulkexec component. i checked the stl_load_errors also: Error is attached.