Hi,
We have just created a simple job to fetch data from MySQL table (Local database and from Amazon RDS), having rows 300,000 and to insert these rows into Redshift. It took us more than 4 hours to do that.
1. Why is it very slow to fetch data from one single table and to insert it in Amazon Redshift using Talend OpenStudio Big data?
2. Is there a way to do a fast insertion? where it should insert it in less than 5 minutes?
Please find the attached screenshots for details.
thanks!
Hello,
We are facing the same problem too. Our Mysql database is installed on amazon EC2 (which is on the same region as of our Redshift instance).
I have set the "Commit every" option to 10000 in tRedshiftoutput component and not using any tMap component. Also it is a plain select statement from Mysql.
For 10300 rows (Table size is just about 10 MB in Mysql) it took about 7-8 min and for 440000 rows(about 50 MB in size) it took about 7 hours.
I have tried using jdbc-output component as well, but it dint make any difference.
Any solution for increasing the performance while using the Redshift component?
Right now the best way I am finding is writing the output to a flat file then upload it to an S3 bucket and use copy command to load to Redshift. This approach is taking less than a minute for the whole thing but is not very convenient and also requires some external script.
Thanks
Aditya
Hi Aditya,
It is appreciated that open a JIRA issue in the Talend DI project of the
JIRA bugtracker. Our developers will see if it is a bug and give a solution.
Post the jira issue link on forum to let others community user know it.
Best regards
Sabrina
Hi,
The current component is using single INSERT statement in order to write into Redshift. This way of doing is totally inefficient according to the Redshift documentation and best pratices.
There are several ways to fix this issue. One of them is the COPY command to load data file which are located on S3 or DynamoDB. You could use this command with the tRedshiftRow component. Another one is the multiple insert, which is going to be implemented by the R&D in the TDI-26155.
Rémy.
Any news on this? I am interested in using Talend to ETL into Redshift from mysql.. i have gotten much faster performance by using Talend to pump out files to S3 then using Amazon tools to pipe them to redshift. the issues was that large files still took a while and lots of IO happening to go to file then up to cloud. One can use Amazon's data pipleline i suppose.. but we lose the rich features of talend transformations...
I think these connectors don't have the BULK feature. On the input you're not able to set a Cursor Size on the output you're not able to set a Batch size. Try to use Regular MySQL/PostgreSQL components, which do have these features. We had something similar with Greenplum.
Hi.. did anyone find a solution for this, i am facing the same problem Reading the data from MySQL and loading to Redshift, but the jobs are too slow....