Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
AWS Degraded - You may experience Community slowness, timeouts, or trouble accessing: LATEST HERE
cancel
Showing results for 
Search instead for 
Did you mean: 
SJ3
Contributor III
Contributor III

Parallel execution with iterate

Hi,

Following job runs without parallel execution:

0683p000009MPdS.png

 

But with parallel execution, this job stops updating hadoop file (in tHDFSOutput_1) after its 1st iteration. I have enabled multi thread execution as well ( under job - extra tab). And I am not using any context variable in this subjob, but it is still not updating that hadoop file. Now I wonder if this is a design issue. I will appreciate any help. Thanks!

 

SJ

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hello,

Actually, talend don't support for transferring data by air. It means you have to get these files on local directory firstly and then load these files to HDFS directory.

Best regards

Sabrina

 

 

View solution in original post

7 Replies
Anonymous
Not applicable

Hello,

From your screenshot, we can see you are using tSCPFileList component to iterate and list your files and folders on a SCP root directory. How did you

get these files on the SCP root directory to a local directory without using tSCPGet component?

Best regards

Sabrina

SJ3
Contributor III
Contributor III
Author

I am not getting these files in my local directory. I am putting files directly in Hadoop directory. I don't want to store data any where in between. But I am not sure why this parallel iteration is not working. It just stops after 1st execution. Thanks though!

 

SJ

Anonymous
Not applicable

Hello,

Actually, talend don't support for transferring data by air. It means you have to get these files on local directory firstly and then load these files to HDFS directory.

Best regards

Sabrina

 

 

SJ3
Contributor III
Contributor III
Author

Hi,

Thanks for the reply. But parallel execution does not work even when I try to pull those files in my local directory:
0683p000009LzHc.png

May be there is some thing wrong in this version of Talend. 

 

SJ

Anonymous
Not applicable

Hello,

Are you able to update all your hadoop files (in tHDFSOutput_1) when you try to pull those files in your local directory?

Actually, if the 'Multi-thread exectuion' box is checked, the different subjobs in the main job will execute parallel. You need to make sure that all the sub jobs are running independently in this way you are going to utilize the multi thread feature in your main job.

Here is a community knowleadge article:https://community.talend.com/t5/Design-and-Development/Can-I-run-different-subjobs-in-parallel-in-a-....

Best regards

Sabrina

 

SJ3
Contributor III
Contributor III
Author

Thanks for sharing this solution! But tHDFSOutput_1 files update doesn't take that long in my case. I can still use multi-thread process though.

And you were right, parallel execution with iterate flow works when files are pulled in local directory instead. :

0683p000009LzIV.png

My file names are too long here so I am using tFileOutputDelimited instead of tSCPGet. Thanks xdshi!

 

SJ

Anonymous
Not applicable

It might be related to your job or the SCP staff. I use this way to parallize jobs a lot and it works very well.