Hi,
I have few questions
Q1 . Can we able to process 60-80 million records in Talend ?
Q2. What is the best way on performance wise , if I have 4-6 million records on lookup ?
Q3. which is best option to store lookup data, if I have 4-6 million look up records ? either file or DB ?
Regards.
Hi, There is no standard answer for the maximum volume of data handled by Talend. It depends on project scale, job design, data source and so on. Do you want to use SQL query or tMap?making a join? to handle Lookups? Please provide the details. ELT components(all tables are in same DB), bulk execute are better way to load large data in a faster way. Best regards Sabrina
Probably a late reply, but it might useful for others.
Q1. Assuming that you are reading from RDBMS Table: Yes, but it should be in stream mode to avoid Java heap error. I have loaded 33 million records extracted from 1 billion records table in MySQL through Talend.
Q2. For better performance, we need to consider the total number of input data and output records count. For the above scenario, I need only 33 millions of records out of 1 billion records. I used a inner join query in extract query with stream mode, instead of lookup. Look up table's data needs to reside in the Server RAM. So, size of lookup table/file depends on the Server RAM, on which Talend installed.
Q3. I advice you to store the lookup data in the DB and join them in the extract query, as the lookup data is huge.
Thanks,
Srini,
AgilitX