Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Parallel execution takes more time than the non-parallel execution

Hi Team,

 

Followed the instructions described in the below link to develop a job execution in a loop:

https://help.talend.com/reader/mjoDghHoMPI0yuyZ83a13Q/iL2h45sTpz~InS1_0iOj5w

 

if I disable the component tSleep and select the check box for Parallel Execution, job takes more time to complete than the non parallel execution.

 

Attached the document that has all the details about the job design and execution results.

 

Can you please let me how to create a job with tloop(iterate) enabling  parallel execution so that .parallel execution  takes less time than the non-parallel execution.

 

Thanks.

 

Labels (2)
16 Replies
Anonymous
Not applicable
Author

Hi,

 

    The parallel execution depends on lot of parameters like number of available threads, memory availability for the Talend job, CPU utilization etc.

 

    Could you please specify your use case for parallel execution? I would recommend to start using parallel processing in Talend using the component tParallelize. Please also increase your Xms and Xmx parameters of the job. You will see the difference in performance when you are running a job with longer processing time.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Anonymous
Not applicable
Author

Hi Nikhil,

 

Thanks for the response.

 

My use case is to process the thousands of files present in the main directory as well as sub directories.

 

Once we read the document present in each main dir as well as sub dir, will send it to Apache Solr API for indexing.

 

To improve the performance, thought of using the Parallel Execution.

 

tParallelize component is not available by default in TOS-7.1.1

 

Is there any known issue with using option Parallel Execution?

 

My system configuration is 16 GB RAM and i7-4790 processor.

 

Attachment has all the details about job design and execution results.

 

Kindly let me know your thoughts on how we can use parallel execution for the above use case to improve performance.

 

Thanks.

 

 


File Processing Execution Results.docx
Anonymous
Not applicable
Author

Hi,

 

   Since you are using free version of Talend, the parallelism options are limited. Could you please try to initiate multiple instances of same job in parallel through scheduler where you can pass the directory name as parameter? In this way, you can run multiple instances of same job and each instance of the job will process specific directory like DirA, DirB etc. Please make sure that you are having enough memory when you are triggering multiple job instances.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

tchandu
Contributor
Contributor

Hello,

 

     You also can use the iteration option. For this you have write the processing logic in sub job and assigning the directories or files will be in main job. Main job will iterate for each folder or file. You can increase the iterations based on your resources. Do not forget to check use independent sub job while calling the sub job. Your job looks like below.

 

tfileList ---iterate-->trunjob.

 

 

Anonymous
Not applicable
Author

Hi Chandu,

 

Thanks for the response.

 

Here are my findings:

Job was designed as you suggested with the below components:

tfileList ---iterate-->trunjob.

 

  • tFileList
  • Parallel Iterate link -  to enable/disable parallelism
  • tRunJob
  • tFixedFlowInput
  • tLogRow
  • Enabled  independent sub job while calling the sub job

Test data:

Test Data:2008 documents distributed in 8 main directories and 100 sub directories for each of 8 main directories;

 

Test Results:

  • Without Parallelism - 3703 milliseconds
  • With Parallelism      - 3469 milliseconds


After selecting independent sub job while calling the sub job, it took  78665 milliseconds

 

Kindly let me know if any other configuration/tuning needs to done.

 

Thanks.

tchandu
Contributor
Contributor

Hello,

 

         While using the independent sub job system should have enough resources because each sub-job executed as a separate instance. Try it out by reducing the number of iterations.

 

 

Anonymous
Not applicable
Author

Hi Chandu,

 

My system configuration is 16 GB RAM and i7-4790 CPU.

 

Configured number of parallel execution to 2 and then it took 148814 milliseconds.

 

Other than Talend Open Studio-7.1.1, not running any other major processes.

 

Thanks.

Anonymous
Not applicable
Author

Hi Team.

 

Issue - Parallel execution(with number of parallel execution 2) took 338 milli seconds where as non parallel execution took 49 seconds to get the document name of 15 PDF files(just file name without reading file content) in a directory.

 

Use Case - Get document name of files in a directory parallely


Talend Tool - Talend Data Integration 6.5.1

 

Here are the Main Job Design Details:

Using tFileList to get the list of all files in a directory
Using iterate to connect to tRunJob and using the Enable Parallel Execution to process files parallely. Number of parallel executions specified were 2.
Passing the FileName as a context parameter as below to subjob:
(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")
Sub Job Details:



To see the whole post, download it here
Job Design Details for Data Integration Tools.docx
OriginalPost.pdf
Anonymous
Not applicable
Author

Hi,

 

    You are saying that parallel execution is happening in milli seconds where as the serial execution is happening around 50 seconds. So could you please tell me the issue now you are facing? I am slightly confused here.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved