
Anonymous
Not applicable
2013-05-06
11:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Slow Insertion in Amazon Redshift
Hi,
We have just created a simple job to fetch data from MySQL table (Local database and from Amazon RDS), having rows 300,000 and to insert these rows into Redshift. It took us more than 4 hours to do that.
1. Why is it very slow to fetch data from one single table and to insert it in Amazon Redshift using Talend OpenStudio Big data?
2. Is there a way to do a fast insertion? where it should insert it in less than 5 minutes?
Please find the attached screenshots for details.
thanks!
We have just created a simple job to fetch data from MySQL table (Local database and from Amazon RDS), having rows 300,000 and to insert these rows into Redshift. It took us more than 4 hours to do that.
1. Why is it very slow to fetch data from one single table and to insert it in Amazon Redshift using Talend OpenStudio Big data?
2. Is there a way to do a fast insertion? where it should insert it in less than 5 minutes?
Please find the attached screenshots for details.
thanks!
1,083 Views
24 Replies

Anonymous
Not applicable
2013-05-23
03:15 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We are facing the same problem too. Our Mysql database is installed on amazon EC2 (which is on the same region as of our Redshift instance).
I have set the "Commit every" option to 10000 in tRedshiftoutput component and not using any tMap component. Also it is a plain select statement from Mysql.
For 10300 rows (Table size is just about 10 MB in Mysql) it took about 7-8 min and for 440000 rows(about 50 MB in size) it took about 7 hours.
I have tried using jdbc-output component as well, but it dint make any difference.
Any solution for increasing the performance while using the Redshift component?
Right now the best way I am finding is writing the output to a flat file then upload it to an S3 bucket and use copy command to load to Redshift. This approach is taking less than a minute for the whole thing but is not very convenient and also requires some external script.
Thanks
Aditya
We are facing the same problem too. Our Mysql database is installed on amazon EC2 (which is on the same region as of our Redshift instance).
I have set the "Commit every" option to 10000 in tRedshiftoutput component and not using any tMap component. Also it is a plain select statement from Mysql.
For 10300 rows (Table size is just about 10 MB in Mysql) it took about 7-8 min and for 440000 rows(about 50 MB in size) it took about 7 hours.
I have tried using jdbc-output component as well, but it dint make any difference.
Any solution for increasing the performance while using the Redshift component?
Right now the best way I am finding is writing the output to a flat file then upload it to an S3 bucket and use copy command to load to Redshift. This approach is taking less than a minute for the whole thing but is not very convenient and also requires some external script.
Thanks
Aditya
480 Views

Anonymous
Not applicable
2013-05-23
04:34 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Aditya,
It is appreciated that open a JIRA issue in the Talend DI project of the JIRA bugtracker. Our developers will see if it is a bug and give a solution.
Post the jira issue link on forum to let others community user know it.
Best regards
Sabrina
It is appreciated that open a JIRA issue in the Talend DI project of the JIRA bugtracker. Our developers will see if it is a bug and give a solution.
Post the jira issue link on forum to let others community user know it.
Best regards
Sabrina
480 Views

Anonymous
Not applicable
2013-05-23
02:09 PM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,
I have reported this behaviour on jira, our R&D team will investigate.
The issue url is: https://jira.talendforge.org/browse/TDI-26155
Regards,
I have reported this behaviour on jira, our R&D team will investigate.
The issue url is: https://jira.talendforge.org/browse/TDI-26155
Regards,
480 Views

Anonymous
Not applicable
2013-05-24
12:21 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Please vote for the jira issue https://jira.talendforge.org/browse/TDI-26155 created by adiallo and adding your comments into it.
Best regards
Sabrina
Please vote for the jira issue https://jira.talendforge.org/browse/TDI-26155 created by adiallo and adding your comments into it.
Best regards
Sabrina
480 Views

Anonymous
Not applicable
2013-05-24
10:24 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The current component is using single INSERT statement in order to write into Redshift. This way of doing is totally inefficient according to the Redshift documentation and best pratices.
There are several ways to fix this issue. One of them is the COPY command to load data file which are located on S3 or DynamoDB. You could use this command with the tRedshiftRow component. Another one is the multiple insert, which is going to be implemented by the R&D in the TDI-26155.
Rémy.
The current component is using single INSERT statement in order to write into Redshift. This way of doing is totally inefficient according to the Redshift documentation and best pratices.
There are several ways to fix this issue. One of them is the COPY command to load data file which are located on S3 or DynamoDB. You could use this command with the tRedshiftRow component. Another one is the multiple insert, which is going to be implemented by the R&D in the TDI-26155.
Rémy.
480 Views

Anonymous
Not applicable
2013-11-28
09:08 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Are there any improvements, in Redshift components, have been made so far with the new version of Talend BD 5.4?
BR!
Are there any improvements, in Redshift components, have been made so far with the new version of Talend BD 5.4?
BR!
480 Views

Contributor
2014-10-07
11:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any news on this? I am interested in using Talend to ETL into Redshift from mysql..
i have gotten much faster performance by using Talend to pump out files to S3 then using Amazon tools to pipe them to redshift. the issues was that large files still took a while and lots of IO happening to go to file then up to cloud. One can use Amazon's data pipleline i suppose.. but we lose the rich features of talend transformations...
i have gotten much faster performance by using Talend to pump out files to S3 then using Amazon tools to pipe them to redshift. the issues was that large files still took a while and lots of IO happening to go to file then up to cloud. One can use Amazon's data pipleline i suppose.. but we lose the rich features of talend transformations...
480 Views

Contributor III
2015-03-25
08:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think these connectors don't have the BULK feature. On the input you're not able to set a Cursor Size on the output you're not able to set a Batch size. Try to use Regular MySQL/PostgreSQL components, which do have these features.
We had something similar with Greenplum.
We had something similar with Greenplum.
480 Views

Anonymous
Not applicable
2015-10-29
02:09 PM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now that the bulk feature exists - how do we connect to it?
480 Views

Anonymous
Not applicable
2016-06-28
03:44 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.. did anyone find a solution for this, i am facing the same problem
Reading the data from MySQL and loading to Redshift, but the jobs are too slow....
Reading the data from MySQL and loading to Redshift, but the jobs are too slow....
480 Views
