Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Slow Insertion in Amazon Redshift

Hi,
We have just created a simple job to fetch data from MySQL table (Local database and from Amazon RDS), having rows 300,000 and to insert these rows into Redshift. It took us more than 4 hours to do that.
1. Why is it very slow to fetch data from one single table and to insert it in Amazon Redshift using Talend OpenStudio Big data?
2. Is there a way to do a fast insertion? where it should insert it in less than 5 minutes?
Please find the attached screenshots for details.
thanks!
0683p000009MDkY.png 0683p000009MEOU.png
Labels (4)
24 Replies
Anonymous
Not applicable
Author

Hi,
Do you set the "Commit every" in tRedshiftOutput and is there any complicate Sql query in your input component? What your current rate?
Best reards
Sabrina
Anonymous
Not applicable
Author

Hi,
Yes Commit every is set to 10,000. There is no complicated query, it is rather a simple query given below:
"select
`dim_ipdata_id` ,
`ipdata_ip` ,
`ipdata_isp` ,
`ipdata_org` ,
`ipdata_country` ,
`ipdata_city` ,
`ipdata_postal_code` ,
`ipdata_longitude` ,
`ipdata_latitude` ,
`ipdata_area_code` ,
`ipdata_metro_code` ,
`ipdata_category`
from dim_ipdata"
Current rate is 8 rows per second. Any idea what could go wrong there?
Anonymous
Not applicable
Author

Hi,
8 rows per second is not a normal rate. I have seen your screenshot and found that tMap component is only used to map data without other action. For a large data, the tMap component consume too much memory. How about removing it and the work flow should be : tAmazonMysqlInput-->tRedshiftoutput or storing the data on disk instead of memory?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi,
Thanks for the valuable suggestions.
I removed tMap to make it like: tAmazonMysqlInput-->tRedshiftoutput , but even that didn't help. Regarding your second suggestion of storing data on disk, I still would have to use tMap for that or do we have other alternative?
Thanks!
Best Regards,
Ilyas
Anonymous
Not applicable
Author

Hi,
I don't think the second suggestion does work for you, because when you have removed tMap, there is no any help, which means tMap is not the block one.
For the "Commit every" option, is there any good news if you change the value "10,000"? It depends on your database, and each time submit will consume DB server resources.
In addition, resource is not the same for different database servers, so there is no fixed standard.
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi,
Yes but even removing tMap may not help us in a long run, because that's the one will allow us to manipulate strings, urls, joins, variables, etc.. For now we are just testing Talend for Redshift by applying simplest possible data transformation.
I've changed "Commit every" from 10000 to 1000 and still no luck! 0683p000009MPcz.png
I used tRedshiftConnection to setup connection and then set tRedshiftInput's connection to "Use existing connection" where tRedshiftConnection was set as ref there. But that keep giving me a NullPointerException, so I had to provide all the connection details inside tRedshiftInput, so it doesn't use "Use existing connection" anymore, could that be a problem?
Best Regards,
Ilyas
Anonymous
Not applicable
Author

Hi,
To be honest, it is a very new component and I'm building a testing environment for it to see if I can get the same issue as yours. I'll come back to you asap, sorry for the inconvenience.
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III

Hi,
To be honest, it is a very new component and I'm building a testing environment for it to see if I can get the same issue as yours. I'll come back to you asap, sorry for the inconvenience.
Best regards
Sabrina

Hi,
I've got exactly the same problem. So is there any solution? Insert is too slow when inserting in Redshift.
Best Regards,
~Sergejs
_AnonymousUser
Specialist III
Specialist III

Hi there,
Wwe have the same problem, 4 or 8 rows per second.
We tested different sources as mysql and postgresql but still the same problem.
next, we tried with talend Open Studio for data integration and Talend for big data, but still that problem.
Also, we tried with a postgresql bulk insert, but it had an error like this:
"Exception in component tPostgresqlOutputBulkExec_1_tPBE
org.postgresql.util.PSQLException: ERROR: COPY CSV is not supported"
Any help, please?
Thanks!
Leo