Solved: Improving the data load speed to Amazon redshift d... - Qlik Community

sushantk19 · ‎2020-10-05

hello Team,

I have a daily job which loads ~6 million data daily(truncate reload). My target database if Amazon redshift. I have used tRedshift_output as the output component. presently the jobs takes somewhere between 4 to 5 hrs to complete. Is there a way to improve the performance of this job? Can i use any other component to improve the performance. My source is a little complex SQL ( 4 full outer join statements)

pakapi_ · ‎2020-10-14

Hi sushantk19 (Customer),

that's great:) So try to do unload with parallel and gzip, then do copy with gzip parameter and let me know if it helep 🙂

View solution in original post

pakapi_ · ‎2020-10-05

Hi sushantk19,

you will get best performance by using COPY command in tDBRow component. You can write Truncate table and copy command in one begin end transaction block. One condition - your source data have to be on S3.

Regards

sushantk19 · ‎2020-10-05

@Kamil Kowalczyk : i used the tRedshiftoutputBulk_Exec component. But i get the below error:

Exception in component tDBOutputBulkExec_1_tROB (job_DM_002_DIM_Subscription_tmp1_history)

com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential. (Service: Amazon S3; Status Code: 400; Error Code:

I am new to talend. Can you tell me how to configure the S3 folder in AWS? i guess if i use the above component then it will first copy the content to S3 folder and then write to the target table..isnt it?

Also, can you suggest the steps to setup S3? i have attached the screenshot of the tDBOutputBulkExec component.

pakapi_ · ‎2020-10-05

If you want to use s3 you have to know credentials, acces key and Secret key.

Amazon has a really great documentation. Please check this link:

https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html

I think you will find there everything you need.

sushantk19 · ‎2020-10-06

@Kamil Kowalczyk : have you implemented this before? can you share screenshot of your S3 settings.

pakapi_ · ‎2020-10-06

What exactly you want to know? Sample screenshot:

sushantk19 · ‎2020-10-07

@Kamil Kowalczyk : i provided access key, bucket & secret key.

Key i left it as blank. Region I choose DEFAULT.

i get the below error:

Exception in component tDBOutputBulkExec_1_tROB (job_DM_003_DIM_Subscription_tmp1_history)

com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection reset by peer: socket write error

what could be cause of this?

sushantk19 · ‎2020-10-08

Hello All,

Any inputs to my above issue? is it mandatory to provide key value??

pakapi_ · ‎2020-10-09

Looks like some firewall blocking your traffic or maybe you don't have permissions to your s3 bucket.

sushantk19 · ‎2020-10-09

S3 Access issue has been resolved now . but i get a new error now:

Exception in component tDBOutputBulkExec_1_tRBE (job_DM_003_DIM_Subscription_tmp1_history)

java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'payments_orders_subscriptions' failed. Check 'stl_load_errors' system table for details

i checked for the ETL and it is the same as the regular flow without bulkexec component. i checked the stl_load_errors also: Error is attached.

Improving the data load speed to Amazon redshift database