Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
CChemali1616516071
Contributor
Contributor

sort data of large size in parallel

Hello,

I am using Talend Data Integration Studio and looking to sort large data set using components

tFileInputPositional, tSortRow and tUniqRow

The run is slow so i am exploring enabling parallelization but i cannot find the "Set

parallelization" option

How can i integrate this option in Talend Data Integration Studio?

Thanks

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hello,

The Parallelization tab is available in talend subscription solution(paid version) not open source.

Best regards

Sabrina

View solution in original post

5 Replies
Anonymous
Not applicable

Hello,

If you set the job running in parallel it means you can have different start points and they will start in different threads in parallel.

With talend subscription solution, you will have the component tParallel in which you can trigger multiple sub jobs in parallel and wait for the end of all of them.

In your job, there are some cache component consuming two much memory. such as tUniqRow and tSortRow. For a large set of data, try to store the data on disk instead of memory on tUniqRow and tSortRow. Also, allocate more memory to execute the job.

Best regards

Sabrina

CChemali1616516071
Contributor
Contributor
Author

Hi Sabrina,

Thanks for your reply!.

I did use the disk and allocated more memory but the run is still slow

I am trying to implement the parallelization described in the Talend Data integration guide link below

https://help.talend.com/r/9bBURCEt_t~lUHE3DeE2LA/kOE3mfYu2ConQETvTlcb0Q

in this guide there is bullet below about activating parallelization

 

"Right-click the start component of the Job, tFileInputDelimited in the scenario, and from the contextual menu, select Set parallelization.Then the parallelization is automatically implemented."

 

I do not have this option with the Talend i downloaded

you mentioned about " talend subscription solution"

How can i subscribe?

 

Thanks!

-Chadi

 

 

Anonymous
Not applicable

Hello,

The Parallelization tab is available in talend subscription solution(paid version) not open source.

Best regards

Sabrina

CChemali1616516071
Contributor
Contributor
Author

I see. thanks !

Anonymous
Not applicable

Hello,

Feel free to let us know if there is any further help we can give.

Best regards

Sabrina