Batch processing large set of MySQL Data

Anonymous · ‎2018-07-27

Hi,

I have a billion of records in MySQL database. I want to read the records batch-wise (say 1000 records at a time), with those set of records i want to map to desired columns and convert to excel sheet or XML file format. Each file should contains 1 lac rows of records. Found this link
https://help.talend.com/reader/Dx4Xf8ykIvnjCCGyFrnUWw/aqGkWGJGV3o7u_MYna4K4A similar to how i wanted so followed this link and created a job as shown below:

but its taking more time for small records for example: i run 2 lac records it almost took 1 hour 45 mins. Please suggest any other process to achieve my expectation or need alteration to do.

Thanks and regards.

Anonymous · ‎2018-08-03

Hi,

In your previous post, you had mentioned that you are planning to use selected columns.

"First i'm reading records from MySQL and mapping using tMap to get desired columns (say i'm considering 50 columns out of 200)"

If you do not need all the columns, please do not fetch them from DB at the beginning itself. It will automatically reduce the size of the result set and there by performance also.

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-08-09

Hi ,
Can you please explain me briefly about tparallelize component and how can i apply this in my scenario.

Thanks and Regards

Anonymous · ‎2018-08-09

Hi,

You can implement parallelism in multiple methodologies. For example, you can create a job which takes the conditions for SQL where clause as parameters and fetch the result to output file.

Once you create a job like this, you can call multiple instances of same job from a parent job by using tparallelize component.

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-08-13

Hi,
Thanks for your reply, i'm unable to understand the solution which you have mentioned. Can you please elaborate according to my scenario it would be of great help.

Thanks and Regards.

Talend Data Integration

v7.x

XML