Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
TXAggie00
Contributor III
Contributor III

TAC task stuck in "Requesting run..." - deadlock victim

Community,

 

Hoping for some insight as to my latest issue.  I jumped on PROD to deploy a breakfix for one of my client's projects and noticed that one of the Execution tasks was stuck on "Requesting run..." and had last run 5 days ago.  This particular task is based on a file trigger and probably gets executed ~50-75 times per day.  I downloaded the log and noticed the following error:

 

2017-09-14 12:08:57 ERROR ErrorLogger  - An error occured while scanning for the next trigger to fire.
org.quartz.JobPersistenceException: Couldn't acquire next trigger: Couldn't retrieve trigger: Transaction (Process ID 80) was deadlocked on 
lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [See nested exception:
org.quartz.JobPersistenceException: Couldn't retrieve trigger: Transaction (Process ID 80) was deadlocked on lock resources with another
process and has been chosen as the deadlock victim. Rerun the transaction. [See nested exception: java.sql.SQLException: Transaction
(Process ID 80) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.]] at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2785) at org.quartz.impl.jdbcjobstore.JobStoreSupport$36.execute(JobStoreSupport.java:2728) at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3742) at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2724) at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:263)

So, obviously something has happened the DB side (SQL Server) to cause a lock or timeout, but there seems to be no way to recover this task.  I can't kill it because it immediately throws this error: org.talend.exception.BusinessException: executionTask.locked2.

 

 

Ultimately, I had to restart the TAC and everything is back running as it should.  I would expect some issues between the TAC and SQL Server, but I would also expect some sort of built in recovery mechanism from Talend.  If a third party library is throwing an error, handle it appropriately.  Can anyone offer some insight?

 

Thanks,

Scott

 

 

Labels (2)
15 Replies
Anonymous
Not applicable

Hello @ris.tan 

Thanks for sharing this information with us.

Would you mind creating a case on talend support portal? Our colleagues from support team would like to create a workitem jira issue for this use case.

Thanks for your time.

Best regards

Sabrina

TXAggie00
Contributor III
Contributor III
Author

Agree to disagree.  To say it is not Talend is an excuse.  We are still on 6.3.1 since every new version puts out a whole new set of hurdles.  To date, this is still a problem when the file triggers get bogged down (we used to get upwards of a thousand files thrown at us at any given time).  We've have had to throttle the other side of the equation in order to avoid this issue.  Is it a database issue... of course, it is a deadlock error that becomes a victom of said deadlock, but for any type of software that deals with a database, you don't just catch these errors and then throw your hands in the air and say "I give up".  It is obviously some part of your process that caused it in the first place.  That's not conjecture, that has been proven time and again.  Not sure what the latest iterations of the TAC are doing and I shudder to think....

 

Thanks,

Scott

JBristow
Creator
Creator

I hear you Scott - and totally agree.

 

I think my pointing out the database connectivity being an issue on our end as the cause of the problem is my frustration at our own network infrastructure here having issues and dropping connectivity between servers.  

 

However - I also strongly believe that a vendor package such as Talend - with the license fees being what they are - should be able to recover rather than have documented solutions being to have customers run a SQL Script to reset database values. It's probably more of as a complex solution than what I've suggested - but some means to allow administrators to gracefully re-set jobs to get them running again without manual database manipulation is needed.  

 

JBristow
Creator
Creator

Will do Sabrina. Thank You.

 

Case has been opened: 00134970

JBristow
Creator
Creator

The response from the opened case was to say that the way we are handing these issues - opening the Talend Administration Database and clearing and resetting values - "The steps that you are currently following is accurate in terms of how to resolve such issues"

They have requested information in order to open up the possibility of adding a way to gracefully force stop a job(s) - so that database manipulation isn't required.

 

I have to say - honestly - that I find this answer totally shocking. Vendor products - especially ones that aren't cheap license fee wise - should never require the customer to go into their product administration database and manipulate data tables in order to clear an issue caused either by the software itself or by a drop in connectivity. Is there any documentation on the database schema and instructions on what should or shouldn't be changed in it? I learned what to do on my own - and the key structure and table relationships aren't always apparent - so if this is the solution then better database documentation is needed. A developer or administrator with access to the database could do more harm than good by accidentally deleting data or resetting row data to the wrong value.

 

It's a dangerous solution - period.   

 

I'll provide the information requested from Talend to hopefully allow a permanent fix in place so database manipulation isn't required.

 

 

TXAggie00
Contributor III
Contributor III
Author

Well, if that was the best Talend could come up with...

 

Your assessment of their reponse @ris.tan, was spot on!  Keep us posted on any updates, please.

 

Explaining this to the client should be entertaining.  They pay the rather high license fee specifically for the TAC, which is really the only reason one would choose this over TOS.  So they pay this high fee for basically a scheduler and job executor that isn't reliable and doesn't notify you when there is something wrong. 

 

As an aside, there are many options available out there that reliably do the same thing for free, just saying...

 

Thanks,

Scott