Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Streamlining user types in Qlik Cloud capacity-based subscriptions: Read the Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Parallelize the subjob

I need to create a job which will ingest list of tables by sqooping data from source (RDBMS) to hadoop then to hive.

I put list of tables in a file, read and iterate it to ingested.

Because I have about 300+ tables to ingest, so it will take time if it ingested just by a process. So I need to parallelize it.

What I currently think is, the job will read the list of tables then split it to 10 tables per array. Each arrays then passed to the subjob to processed.

I already do this logic by implement it in spark scala code. The problem is, we need to move it to be a talend job so it will be more easy to monitor and maintenance by operation team since the only familiar with talend while I don't know how to implement this logic in talend.

I will appreciate any help. Thanks.,

 

Labels (3)
1 Reply
Anonymous
Not applicable
Author

Hi mahadi-siregar
You can try checked the option 'Enable parallel execution' on the basic settings panel of iterate link, and check 'Use an indepentent process to run subjob' option on tRunJob (call the child job and pass the current table name to child job).
Let me know if this way could improve the performance。

Regards
Shong