Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
vasan212
Contributor
Contributor

Running the same Spark Job in multiple Instances/threads

Hi All,

I have a use case where we have around 100 input files in S3 bucket and all the source file information are stored in a metadata table(File name, Source, source directory,Target e.t.c) which will be passed as a parameter to a spark batch Job via context variable. I need the Job to process all the files parallely instead of iteration, in such a way the Job will be triggered in a multiple instances at the same time with 100 different parameter(for 100 files).

Can this be achieved with Talend spark batch job?.

 

About the process in the Job: Job will fetch the file and push it as a parquet file format into another S3 bucket after partitioning.

Labels (6)
1 Reply
Anonymous
Not applicable

Hello,

Are you referring to call a child spark job by tRunJob? The parent standard job and child spark job.

Let us know if this article is what you are looking for.

https://community.talend.com/t5/Architecture-Best-Practices-and/Spark-Dynamic-Context/ta-p/33038 

Best regards

Sabrina