Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Design to load millions of rows in minutes

Hi Everyone,
We are having around 75 millions of rows in our source Redshift based table and planning to load into Mysql db table ,
could some one please suggest a better design to load this in minutes through talend job as currently its taking long time even if we tried the source data in to chunks to process at a time (for ex:- process 400000 rows at a time and loop through remaining chunks).
I need to first update existing  (if there is a any changes in source) else insert new rows.
Both update and Insert operations being performed using sql statements in tmysqlrows instead of tmap lookups etc...Still job is taking long time.
It will be very useful if anyone suggest the best approach to deal millions of rows to update or insert logic.

Talend is enterprie dataintgeration  edition 6.2.1 that we are using.

Thanks,
kmrx
Labels (2)
1 Reply
Anonymous
Not applicable
Author

You need to give us more info.
1) How fast is it at the moment?
2) How fast do you want it?
3) Are the job, Redshift and mysql in the same environment or does the data have to cross the internet?
4) Are indexes (in MySQL) used? Can they be switched off for the load?

You say you want 75,000,000 rows to be processed in minutes. You do realise that for this to be done in 1 hour (for example) would require 20833 rows per second. While that is certainly not unachievable, it would still be hard in an hour using a single MySQL instance in a remote location to the Redshift box and the job.