Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Blade
Contributor
Contributor

tloop performance

Hi Talend contributors,

My work requires a job that read about 5 million records from a csv. To avoid over memory problem, I design my job like this:

tFileRowCount => tLoop (loop 1000 records per time) => tFileInputDelimited => tDbOutput

So this is a good idea to avoid overload of memory, that's read the csv 1000 records for each time of loop running.

But I am not sure how the tloop manages memory.

So my question is, does tloop automatically clear memory each time it's start a new round of loop.

Example as java code:

for(int i = 1; i <= 5 millions; i=i+1000) {

// read csv from now number i to i+1000 and write to db

}

=> in this case, the job's memory only store 1000 records all the time, right?

Thank you!

Labels (3)
3 Replies
CLi1594691515
Contributor
Contributor

Hi,

 

In my opinion, reading 5 million and load into DB will not cause over memory problem.

You may simply use Component Bulk load to load data into your database which is way faster than using DB output.

 

Best Regards,

Andy

Blade
Contributor
Contributor
Author

Hi @Chun Yin Li​ , thank you for the suggestion. 5 million is just an example 😅 Let's say we dont know how many input we have, maybe 5 billion or 100 billion, so that should be a better way to handle this very big data

CLi1594691515
Contributor
Contributor

Hi,

 

Applying your concern.

I think you may apply following

  1. use tFileRowCount to see if row count larger than certain row to proceed point 2 else load them into DB directly
  2. tFileInput -> tFileOutput (use Split output in several file option) to split into small rows files and load them into DB one by one

if storage is not the concerning point.