Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to Design a Job to Reconnect to and Re-run a Talend Job after a DB connection Failure

Hi,

 

I am using Talend Open Studio. I am trying to design a job to reconnect and re-run when a job failed due to DB connection failure. After the failure, let the job to sleep for 20 seconds after each iteration. I would like the job to retry to connect again and again(for a fixed number of times) until there is a successful connection.

I tried the following steps:

Parent job flow : tLoop -----> tRunJob -----> tJavaRow

 

tLoop code (use while loop) : context.status (status is a boolean context variable with default value as true

tJavaRow : 

                      if(row1.errorCode==0){
                      context.status=false;
                     }else if(row1.errorCode==1&&
                    ((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION"))<15){
                    context.status=true;
                    Thread.sleep(5*1000);
                    }else{
                    context.status=false;

 

This is not working for me. Please help me to design a job to retry after a failure.

Labels (3)
10 Replies
Anonymous
Not applicable
Author

You are on the right track, and it is difficult to walk you through this step by step without being with you, but here are some pointers.

 

1) The looping mechanism you are using is a good idea. Keep that.

2) At the end of your tRunJob add a RunIf link with simply true added to the IF condition. This will mean it will fire every time.

3) Add a tJava after the RunIf and use similar condition as below (....I've estimated what this should be based on what you added)

 if(((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))==0){
        context.status=false;
 }else if(((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))==1&&
                    ((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION"))<15){
        context.status=true;
        Thread.sleep(5*1000);
 }else{
       context.status=false;
}

The condition above uses the success variable from the tRunJob (((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))). This is for a tRunJob called tRunJob_1. You may need to tweak that.

 

Once you change to the above, you *should* see a difference.

 

As I said, it is difficult to assess precisely what was wrong, but that would be my best guess based on what you posted.

 

 

Anonymous
Not applicable
Author

@rhall Thanks for the quick response. I tried the proposed solution. Changed code and added run if link.

Loop is running and if condition firing, but when the DB connection is back it is not reconnecting and re-running the job. 


s1.JPG
Anonymous
Not applicable
Author

Hi @SP_BI 

 

     Its good to have the loop mechanism and you are plugging a possible issue.

 

    But why the connection timeout or connection break is happening at first place? You may have to review the network and DB layer again as I am worried about the possible issues due to it in Production environment.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

@nthampi

Thank you for checking this issue.

The DB connection failure is predicted. 

sometimes job fails because of some db connection errors.  when such issues arise I would like the job to retry to connect again and again(for fixed number of times) until there is successful connection.

Please share your views

Anonymous
Not applicable
Author

@SP_BI 

 

Frankly speaking, if they are already predicting failure that way, they should work on fixing it 🙂

 

Did they tell any reason why they are not fixing it? It is not a good way of doing things.

 

Better you document it as a risk and get sign off from the client as an architectural assumption. It should not come back later back to you like a design issue.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Assuming that you will have to put up with this issue (and also, you may find your DB connection will fail for many reasons and not just because "it does every so often"), this method should work for you. Can you give an example of what actually happens when the connection fails? Does the job keep trying and keep failing? Does it connect ever? The method I gave you should allow the restart to take place if everything is configured correctly. Can you give a bit more info about the failure?

Anonymous
Not applicable
Author

@rhall

 

I intentionally made the job to fail with a DB connection error. It shows the number of executions finished and execution running status on iterate link from tLoop to tRunjob.Attached here the screenshot and when the DB connection is back after completing 2-3 iterations, it is not reconnecting back and the job completed 10 iterations with the same connection error and exits the execution.


s2.JPG
Anonymous
Not applicable
Author

Forcing the error introduces more questions. How did you force the error? How do you know the error is not repeating over the next few iterations?

Anonymous
Not applicable
Author

@rhall

 I disconnected the DB connection for few seconds using Edit connection in Talend and ran the job .restored the connection back within the given iteration and tested the DB connection in Talend .it showed as successfully connected. but the running job is not recognizing the restored connection.

Please correct me if i'm wrong on my requirement.