Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] parallelization In talend

Hi,
I tested above test case(Reading from excel Sorting and writing into file mentioned in 
https://help.talend.com/search/all?query=How+to+automatically+enable+parallelization+of+data+flows+f... article ) and results are as follows,
My Configuration is :
i3 processor (4 logical cores)
4GB RAM
<
   
Test Details Time taken Single Thread(sec) Time taken 3 - Thread (Sec) Time taken 2 - Thread (Sec) Rows
Reading from excel and writing to db 25 29  26 190853 Rows
Reading from excel and writing to File 3 15  5 190853 Rows
Reading from excel and writing to db 58 59 59 381706 Rows
Reading from excel Sorting and writing into file 16 81 21 381706 Rows
Reading from excel Sorting and writing into file 8 9 8 190853 Rows
 
and it seems by enabling parallelization Jobs are actually getting slower. Then what is use of  parallelization ???
 
please explain.
 
Thanks,
Pankaj
0683p000009MB7t.png
Labels (2)
11 Replies
Anonymous
Not applicable
Author

Hi RRaj,
The rows/s(rate) is a row rate which means the processing rows in second.
Did you only use tSort component in your job? What's your target output? DB or flat file? Could you please show us your job design?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi,
I have created a job to process a large cvs file (around 5 million records) and file size will keep on increasing on daily basis. I have 3 sub jobs those are processing this file to get the required information. There are different components used in these jobs like tFileInputDelimeted, tSortRow, tMap, tUniqueRow, tFileOutputDelimeted etc. Please see below images for one of my 3 sub jobs.
  0683p000009MAjn.png 0683p000009MB83.pngI am using 'Sort on disk' option with buffer size 100,000 for tSort and 'Sort temp data' for tMaps with buffer size 100.000 to handle memory issues.
My problem is that this talend job taking too much time to process this file. Currently it is taking around 20 minutes and as the file size is increasing on daily basis so process time will also increase. I just want to learn how we deal such scenarios while creating talend jobs.
 
Thanks,
Shakeel