Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
vidya821
Creator
Creator

Error handling for job

Hi All,

 

Can anyone please explain me the concept of error handling.

How can i track the child jobs errors in parent job so that i can send an error mail or success mail from parent job

Also if one of the child job fails then the complete parent job should fail and error message should be sent

 

What i have noticed that the other parallel jobs still continue to process even if one of the jobs fails

How can i stop the complete parent job to fail if one of the subjob fails ?

 

Also need clarification on below point

1)  what is the difference between the OnSubjobError and OnComponentError from the parent job or how to use these components.

 

Attached is the sample example.

 

Thanks

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

There is a way, but you will need to really pay attention to controlling this. For child jobs 1-1, 1-2, 2-1 and 2-2 use the mechanism I described where you connect to a tDie and terminate the JVM (tDie option). This will kill the other job within that parent job. But for Parent Jobs 1 and 2 set the "Use independent process to run subjob". That should allow your parent jobs (1 and 2) to report on what has gone wrong in your top level parent job. 

View solution in original post

17 Replies
Anonymous
Not applicable

There are several ways to achieve this depending on how difficult you want to make it.

 

A way that I like to work is to create a couple of "reporting" jobs. One to be used at the beginning of each job (using a tPreJob component) and one to be used at the end of each job (using a tPostJob component). I have a back end database and every time a job starts or ends I use these jobs to log details about it. There is loads you can log but "job name", "start date", "end date", "status", "pid", "total rows", "success rows", "failed rows", etc, might be a nice place to start. The advantage of using the tPostJob to trigger your "end job" Job is that it will always run...even with a Java error. So you will always be able to log a status for the run. If you use the pid (process id) you can also link these results with the AMC functionality. I have written a piece on the AMC here (https://www.rilhia.com/tutorials/talend-activity-monitoring-console-amc). You may actually be able to get away with just using the AMC, but I use the above method to "hook in" to other bits and pieces.

 

With regard to killing the job immediately, you can either make use of the "Die on child error" functionality supplied within the tRunJob component, or you can make use of the "CHILD_RETURN_CODE" globalMap variable for the job. This can be useful in deriving your own logging logic. But essentially, the job will return a number which can be accessed using the following code....

 

((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))

....the number of the tRunJob changes depending on which one you are referencing. If this number is 0, everything is fine. You can set the output using the tDie component. The tDie will also end your job immediately.

 

The difference between OnSubJobError and OnComponentError is that the OnSubjobError is triggered by a SubJob error and the OnCOmponentError is triggered immediately by a component error. An easy scenario to describe of how these can be used differently is with an OnSubJobError you will do something on any error within the SubJob. With an OnComponentError, you can do different things per component within the SubJob that errors. To be honest though, I rarely use them. They do not work as you would expect in many cases and are certainly not uniform in how they report on errors. For example, OnComponentErrors work in very strange ways with some database components.

 

EDIT: I have just looked at your screenshot and feel I may have gone a little over board with my brain dump here. You may be able to solve your problem quickly using RunIf links checking the status of the child job (using the code above) and connecting to a tDie.

 

 

 

 

vidya821
Creator
Creator
Author

🙂
Thanks Rhall for the information, however my point is how to fail or stop or interrupt the parent job if any of the child job fails.
Have attached the sample example with the suggesstion given by you to use RuniF (is this what you were saying to implement, however this is not working , please let me know the mistake i ma making).

 

else please let me know the design changes to be done in sample example to get the required output.


Thanks


Sample_test_error_handling.png
Anonymous
Not applicable

Yes, that was why I added the EDIT at the end of my post. If you use RunIf links after your child jobs and set the code in the RunIf to respond to the status of the child job (shown in the code section in my first description, but shown below for the RunIf code) .....

 

((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))!=0

....you can connect to a tDie and set your error message and error code. You will need to switch off the "Die on child error" tick box for this method to work.

 

I would like to point out that the child jobs you have which are unconnected do not make them run in parallel. It is actually a bad idea to do this. All that will happen is that Talend will run them in the order (I believe) in which they were dropped onto the workspace. It will not be true parallel running.

Anonymous
Not applicable

What your example proves is that unconnected subjobs do not run in parallel. One will run after the other. Your second job is running before your 1st job. You need to use the Enterprise Edition to get parallel running.

vidya821
Creator
Creator
Author

Yeah Thanks, the RunIf works if the "Die on child error" is unchecked.

Parallel running,
The child jobs are do executed in Parallel as per my output files (since multiple files are generated at same time due to parallel execution)

But in anycase, since these child jobs belong to same parent job , is there any way to interrupt or disconnect the parent job (stop parent job execution) in case any of the child job fails
vidya821
Creator
Creator
Author

Here, modified the sample example to display the child jobs starting message to show that child jobs are running parallely


Modified_sample_test_example.png
Anonymous
Not applicable

@vidya821 they are not running in parallel by just dropping them into a job unconnected. You can test this. Create a simple child job with one component, a tJava component. In the tJava component add this code....

 

System.out.println("Start job");
Thread.sleep(10000);

 Now put two versions of this job into a parent job. You will see "Start Job" is printed immediately, then 10 seconds later "Start job" is printed again and then 10 seconds later the parent job will finish. They are definitely run sequentially. 

vidya821
Creator
Creator
Author

@rhall
Understood, parallel execution you are talking about is using the same job twice but with different context say,
the one i am referring is two set of different jobs executed parallely...
am i correct ?
thanks
Anonymous
Not applicable

No. You can create two different jobs with the code I showed you (or completely different code....with a sleep so you can see it) and they will still run one after the other.