Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Big data tables - How to deal with 152 Mil rows Mysql data table, insert or update data action

Hello all,

I have a very big MySQL data table which contain 52 Mil rows (40GB) 

I have the same schema csv files which need to update/insert to the MySQL table

The problem here is that the size of MySQL table is very big, so that it take 366 second to process 1 row in the csv file.

How can I deal with this case to speed up to around 1000 rows per second? because my csv file may contains more than 2Mil rows.

 

Thanks

Labels (2)
2 Replies
Anonymous
Not applicable
Author

Hi,

 

    You will have to do the update/insert based on the primary key. If you are already using primary key then you will have to see the possibility of doing performance tuning to the table. 

 

     Did you check with your DBAs about possible performance tuning exercises?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

rmartin2
Creator II
Creator II

Hi,

 

You clearly need to put some indexes onto the update key.

Another "dirty" way to improve your preformance, if it's not enough : perform the insert (and don't kill connection on error), and then an update.

 

Example :

you have 1000 to put in your table

800 are new

200 are already there

 

You will try to :

 * insert 1000 rows => you will have 800 success and 200 errors to discard

 * update 1000 rows => it will update the 800 freshly inserted, and update the 200 already exsting, and can't produce an error unless there's a real issue in the insert.

 

This "dirty fix" is made to avoid a lack of management in prepared statement when having multiple operation (update/insert for example)

 

But for the indexes, you should let a DBA check you data to create the best index possible.

150M rows is not a lot, and you easily achieve a decent 10k/s depending on your server.