Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
cancel
Showing results for 
Search instead for 
Did you mean: 
ankushd
Contributor
Contributor

Iteration in Spark Job

HI All,

We have a requirement to read multiple hdfs files and convert them into parquet. the input files will be present in different directories and recursive path.

We want to iterate all the files and pass it to output file component. do we have any component that can iterate files and hold the file name as global variable?

Labels (2)
2 Replies
Anonymous
Not applicable

Hi,

 

    You can do all the control part with a DI job and can trigger the BD job using independent child process option selected as on.

 

Warm Regards,

 

Nikhil Thampi

ankushd
Contributor
Contributor
Author

Thanks Nikhil. we designed our job with same logic but we are facing processing slowness when we use standard job.

 

We are using below operations in master job

1. Download file from S3 to local & copy to hdfs

2. Convert csv file to parquet hdfs

3. Copy hdfs file to local & upload to S3

 

Currently we are not able to run more than 10 parallel flows. job server is 8 cpu machine and accepting only 8 tRunJob flows. do we have any solution for increase the parallel threads.

 

As as we are getting slowness, we decided to use pure big data jobs.