Re: How to use parallelization of data flows. - Qlik Community

DerfelCadarn · ‎2014-09-15

Hi,
I tried to test the new feature called "parallelization of data flows".
Consequently, I created the Talend job below.
tFileInputDelimited with 5000000 rows,
| row1
tMap with a little transformation,
| row2
tFileOutputDelimited
For row1 :
« Nombre de process enfants » = 3 (I have 4 processors)
« QUEUE_SIZE » = 5000.
Pour row2 :
« Merge sort partitions » = yes
« QUEUE_SIZE de out1 » = 5000
Unfortunatly, my results are really bad : the new fonctionality increase the processing time by 10 seconds.
On the internet, I only saw examples of this feature with a tSortRow component.
Could you tell me if it is possible to implement this new feature with other Talend components ?
Regards.

Anonymous · ‎2014-09-15

Hi,
Based on your job description, i don't think that you have joins in tMap. What you are doing is filtering application... you can use tfilterrow component. One more thing is that you are not receiving memory error, you can also think ff Hash components to replicate data to two filter row components and write data independently to your output device.
Vaibhav

Anonymous · ‎2014-09-15

hi all,
I guess that the purpose is to know if parallelization of data flows as explained in talend help is only to optimize tSortrow component or any other ones :
https://help.talend.com/search/all?query=How+to+enable+parallelization+of+data+flows&content-lang=en
Seems that functionnality is an automation of tPartionner :
https://help.talend.com/search/all?query=tPartitioner&content-lang=en
but I don't know if it can use on other component as asked ...
regards
laurent

DerfelCadarn · ‎2014-09-15

Yes, that is exactly what I wanted to know :
- Is it possible to use this new feature with an other component than a tSortRow ?
- Could you please give us some "use cases" ?
Thank you in advance,
Regards.

How to use parallelization of data flows.

Talend Data Integration

v5.x