Hi,
I tried to test the new feature called "parallelization of data flows".
Consequently, I created the Talend job below.
tFileInputDelimited with 5000000 rows,
| row1
tMap with a little transformation,
| row2
tFileOutputDelimited
For row1 :
« Nombre de process enfants » = 3 (I have 4 processors)
« QUEUE_SIZE » = 5000.
Pour row2 :
« Merge sort partitions » = yes
« QUEUE_SIZE de out1 » = 5000
Unfortunatly, my results are really bad : the new fonctionality increase the processing time by 10 seconds.
On the internet, I only saw examples of this feature with a tSortRow component.
Could you tell me if it is possible to implement this new feature with other Talend components ?
Regards.
Hi,
Based on your job description, i don't think that you have joins in tMap. What you are doing is filtering application... you can use tfilterrow component. One more thing is that you are not receiving memory error, you can also think ff Hash components to replicate data to two filter row components and write data independently to your output device.
Vaibhav
Yes, that is exactly what I wanted to know :
- Is it possible to use this new feature with an other component than a tSortRow ?
- Could you please give us some "use cases" ?
Thank you in advance,
Regards.