Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
TXAggie00
Contributor III
Contributor III

TAC task stuck in "Requesting run..." - deadlock victim

Community,

 

Hoping for some insight as to my latest issue.  I jumped on PROD to deploy a breakfix for one of my client's projects and noticed that one of the Execution tasks was stuck on "Requesting run..." and had last run 5 days ago.  This particular task is based on a file trigger and probably gets executed ~50-75 times per day.  I downloaded the log and noticed the following error:

 

2017-09-14 12:08:57 ERROR ErrorLogger  - An error occured while scanning for the next trigger to fire.
org.quartz.JobPersistenceException: Couldn't acquire next trigger: Couldn't retrieve trigger: Transaction (Process ID 80) was deadlocked on 
lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [See nested exception:
org.quartz.JobPersistenceException: Couldn't retrieve trigger: Transaction (Process ID 80) was deadlocked on lock resources with another
process and has been chosen as the deadlock victim. Rerun the transaction. [See nested exception: java.sql.SQLException: Transaction
(Process ID 80) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.]] at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2785) at org.quartz.impl.jdbcjobstore.JobStoreSupport$36.execute(JobStoreSupport.java:2728) at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3742) at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2724) at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:263)

So, obviously something has happened the DB side (SQL Server) to cause a lock or timeout, but there seems to be no way to recover this task.  I can't kill it because it immediately throws this error: org.talend.exception.BusinessException: executionTask.locked2.

 

 

Ultimately, I had to restart the TAC and everything is back running as it should.  I would expect some issues between the TAC and SQL Server, but I would also expect some sort of built in recovery mechanism from Talend.  If a third party library is throwing an error, handle it appropriately.  Can anyone offer some insight?

 

Thanks,

Scott

 

 

Labels (2)
1 Solution

Accepted Solutions
JBristow
Creator
Creator

The response from the opened case was to say that the way we are handing these issues - opening the Talend Administration Database and clearing and resetting values - "The steps that you are currently following is accurate in terms of how to resolve such issues"

They have requested information in order to open up the possibility of adding a way to gracefully force stop a job(s) - so that database manipulation isn't required.

 

I have to say - honestly - that I find this answer totally shocking. Vendor products - especially ones that aren't cheap license fee wise - should never require the customer to go into their product administration database and manipulate data tables in order to clear an issue caused either by the software itself or by a drop in connectivity. Is there any documentation on the database schema and instructions on what should or shouldn't be changed in it? I learned what to do on my own - and the key structure and table relationships aren't always apparent - so if this is the solution then better database documentation is needed. A developer or administrator with access to the database could do more harm than good by accidentally deleting data or resetting row data to the wrong value.

 

It's a dangerous solution - period.   

 

I'll provide the information requested from Talend to hopefully allow a permanent fix in place so database manipulation isn't required.

 

 

View solution in original post

15 Replies
Anonymous
Not applicable

Hi,

Could you please indicate on which build version you got this issue?

Here is a jira issue:https://jira.talendforge.org/browse/TMC-1019

Best regards

Sabrina

Anonymous
Not applicable

We are also having the same probelm in our env . But it is not showing any progress for 50 min...after that it is running and some times it getting failed....

Anonymous
Not applicable

Hello,

Is it a random issue from your side? Is there any more error message in TAC log?

Best regards

Sabrina

rohitpatil1993
Contributor II
Contributor II

Hie 

Even we are facing the same issue in our Production Environment.

We even restarted the TAC, still we are facing the same issue.

We are using 6.2.1 version.

Please help soon.

Anonymous
Not applicable

Hello @rohitpatil1993 

There is variety of reasons when your task status keeps on Requesting run, and the job is not working.

If restarting TAC doesn't help, please run following SQL to directly change database.

update <schema>.executiontask set processingstate=0, status='READY_TO_RUN', errorstatus='NO_ERROR' where id=<task ID>;

 You'd better create a case on talend support portal for your subscription solution. In this way, we could give you a remote assistance through support cycle with priority.

Best regards

Sabrina

Anonymous
Not applicable

Yaaa... We got patch for the issue
Anonymous
Not applicable

Restart the database it will resolve the issue
Anonymous
Not applicable

Hello,

Great it is fixed. Did you get a fixed patch? Restarting the database is the real solution for your issue? Thanks for your time.

Best regards

Sabrina

JBristow
Creator
Creator

I continue to see this issue. The root cause is not Talend - but is caused by connectivity to our database server dropping. When that drops - I see jobs hang in varying degrees of status: "Requesting Run..."; "Running...."...and the next triggers misfire as Talend thinks the job is still active. I've discovered it takes different solutions to get everything back on track. In some cases I can 'Kill" the job when it's "Running" and everything resets. Sometimes though that resets the status to "Ready to Run" but the trigger stays in a "Waiting for Job To Finish" status. I've learned a lot about the Talend Administration Database and have been able to open it up and clear the status in the ExecutionTask table as well - although I must say that  going into a vendor provided database and manually making changes scares the heck out of me. The other day I restarted the TAC and everything reset. So solutions vary - but when I'm talking about 20+ jobs hanging all at the same time - and I need to get those running again - it's a vary painful problem.

 

Is there anyway a job can be forced re-set when these types of situations occur? While the issue isn't caused by Talend - having a graceful way to recover and get jobs running again within the TAC - without database changes - seems like a logical solution.

 

Thanks.