Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
sushantk19
Creator
Creator

Improving the speed with tparallelize component

hello friends,

1.I am running 2 jobs in parallel to save time. But the process takes a long time when they execute in parallel, so i am now thinking of running them sequentially only. How to improve the load time when 2 jobs are running in parallel? what is this option "Use an independent process to run a Subjob".

2.There are 2 drop down options 1. End on 1st Subjob.2. End of all subjobs. If i choose option 1 (

End on 1st Subjob), is it like the other jobs will start running as soon as one of the subjobs gets executed successfully? I assume with option 2 (End of all subjobs), it waits for both the jobs to complete & then execute the other jobs. IS this understanding correct?

Labels (2)
4 Replies
vikramk
Creator II
Creator II

Hi @Sushant Kapoor​ ,

You are almost there. "end of first subjob": it sequences the particular subjob to be executed at the end of the first subjob.

"end of all subjobs": it waits the relevant subjob to be executed at the end of all subjobs.

please go through below use case, it will help you understanding.

https://help.talend.com/reader/o2I5HrOFZtZItmjCxsjUtQ/hTNWWNoHPUOHW1i7eiq9tQ

Please mark as solution if it helps.

sushantk19
Creator
Creator
Author

@Vikram Kumar​ : Unfortunately, the above explanation did not solve my problem completely. I am aware of above theory. My 2 queries are:

 

  1. How to improve the job performance when 2 jobs run in parallel. i feel its very slow now when my 2 jobs run in parallel
  2. my main job design is as below:

 

tParallelize----> Subjob1

|

Subjob2---> Subjob3...... so on

I want Subjob1 and Subjob2 run in parallel. Subjob1 is time consuming SCD2 job so that should keep running in parallel by the time Subjob2, Subjob3 etc run sequentially. what i want is Subjob3.Subjob4 etc should keep running and not wait till Subjob1 is completed. what option do i use? "end of first subjob": OR "end of all subjobs": ??

vikramk
Creator II
Creator II

Hi @Sushant Kapoor​ ,

  1. You can improve the job performance in below ways:
    1. Using native drivers instead of open source jdbc driver though talend team won't support for native drivers
    2. You can try changing the commit size for db
    3. You can utilize jvm parameters to increase the memory size
    4. You can breakdown the subjob which is taking long time

2.You can follow the below method to achieve your requirements based on the time taken for each subjob. Your largest time taking subjob you will use as second subjob, as first subjob you can use small job, one you can use synchronize option based on end of your first job which is smaller one, remaining all you connect parallize. Please try modifying it to fit your requirement.

0693p00000AFzHoAAL.pngPlease let me know if it helps.

 

sushantk19
Creator
Creator
Author

@Vikram Kumar​ : Thanks for your inputs.

You can improve the job performance in below ways:

  1. Using native drivers instead of open source jdbc driver though talend team won't support for native drivers... I am not sure how to do this
  2. You can try changing the commit size for db... from where do we do this?i cant find settings where to change this.
  3. You can utilize jvm parameters to increase the memory size.. i guess i already have done this. please check my attached screenshot.
  4. You can breakdown the subjob which is taking long time.. Job design is very simple. Cant do much about it.please check my attached screenshot. it is just source-->tmap--->SCD2

 

 

2.I have attached my expected job design. my requirement is that core_subscription subjob should run in parallel with other sub jobs such as core_country, core_payment etc...should i check the option "End on 1st Subjob" in tparallelize component?