Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
vasan212
Contributor
Contributor

Running the same Spark Job in multiple Instances/threads

Hi All,

I have a use case where we have around 100 input files in S3 bucket and all the source file information are stored in a metadata table(File name, Source, source directory,Target e.t.c) which will be passed as a parameter to a spark batch Job via context variable. I need the Job to process all the files parallely instead of iteration, in such a way the Job will be triggered in a multiple instances at the same time with 100 different parameter(for 100 files).

Can this be achieved with Talend spark batch job?.

 

About the process in the Job: Job will fetch the file and push it as a parquet file format into another S3 bucket after partitioning.

Labels (6)
1 Reply
Anonymous
Not applicable

Hello,

Are you referring to call a child spark job by tRunJob? The parent standard job and child spark job.

Let us know if this article is what you are looking for.

https://community.talend.com/t5/Architecture-Best-Practices-and/Spark-Dynamic-Context/ta-p/33038 

Best regards

Sabrina