[resolved] parallelization In talend

Anonymous · ‎2015-07-22

Hi,
I tested above test case(Reading from excel Sorting and writing into file mentioned in
https://help.talend.com/search/all?query=How+to+automatically+enable+parallelization+of+data+flows+f... article ) and results are as follows,
My Configuration is :
i3 processor (4 logical cores)
4GB RAM
<

Test Details	Time taken Single Thread(sec)	Time taken 3 - Thread (Sec)	Time taken 2 - Thread (Sec)	Rows
Reading from excel and writing to db	25	29	26	190853 Rows
Reading from excel and writing to File	3	15	5	190853 Rows
Reading from excel and writing to db	58	59	59	381706 Rows
Reading from excel Sorting and writing into file	16	81	21	381706 Rows
Reading from excel Sorting and writing into file	8	9	8	190853 Rows

and it seems by enabling parallelization Jobs are actually getting slower. Then what is use of parallelization ???

please explain.

Thanks,
Pankaj

Anonymous · ‎2016-01-27

Hi RRaj,
The rows/s(rate) is a row rate which means the processing rows in second.
Did you only use tSort component in your job? What's your target output? DB or flat file? Could you please show us your job design?
Best regards
Sabrina

Anonymous · ‎2016-08-22

Hi,
I have created a job to process a large cvs file (around 5 million records) and file size will keep on increasing on daily basis. I have 3 sub jobs those are processing this file to get the required information. There are different components used in these jobs like tFileInputDelimeted, tSortRow, tMap, tUniqueRow, tFileOutputDelimeted etc. Please see below images for one of my 3 sub jobs.

I am using 'Sort on disk' option with buffer size 100,000 for tSort and 'Sort temp data' for tMaps with buffer size 100.000 to handle memory issues.
My problem is that this talend job taking too much time to process this file. Currently it is taking around 20 minutes and as the file size is increasing on daily basis so process time will also increase. I just want to learn how we deal such scenarios while creating talend jobs.

Thanks,
Shakeel

Talend Data Integration

v5.x