Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
mdaly
Contributor
Contributor

Talend Remote Engine and “Awaiting an available engine to be executed” when the Engine is Available

We have a Talend Remote Engine Cluster with one engine set to run up to 6 jobs concurrently. We have a plan that runs on the :30 every hour of every day. It consistently takes ~20 minutes, when it should take ~2 minutes most of the time. This because tasks get stuck with “Awaiting an available engine to be executed”, although the engine should be/is available for doing work.

 

As an example, at 15:30 the plan started.

The first step has 1 task, starts at 15:30:08, execution starts at 15:30:13, and finishes in 5 seconds at 15:30:18.

The second step has 3 tasks, starts at 15:30:18, the longest task takes 6 seconds and it is done at 15:30:24.

The third step also has 3 tasks, starts at 15:30:24, the longest task takes 8 seconds and it is done at 15:30:32.

The fourth step is where things go wrong. It has 16 tasks.

  • The first 6 tasks queue at 15:30:32 and start by 15:30:35, the longest taking 8 seconds. All 6 have finished by 15:30:43. Awesome!
  • I would expect the next 6 tasks to be queuing in as tasks release, so by 15:30:43 the next 6 are running.
  • However, the next 6 tasks wait exactly 4 minutes from the start time of the previous 6 within this step. The next 6 start at 15:34:35. They all finish within 8 seconds - why did they wait for minutes to process in a flash? 
  • I would again expect the next 6 tasks to be queuing in as tasks release, so by 15:30:51 we are on to the next 6.
  • However, the next 6 tasks wait exactly 4 minutes again from the previous 6 before they start running, so starting at 15:38:35 (instead of 15:30:51 if the next task within the step would start immediately once a resource is freed).

This occurs twice within the fourth step above, and again twice within the fifth step of our plan, costing us >16 minutes of just waiting when we want it to run and finish faster so we can run it more frequently and leverage our home grown queueing systems.

 

The fact that it is 4 minutes exactly indicates to me there’s a configuration setting somewhere – but I can’t find any documentation about this. Do you know where in the configuration this would be? Why would tasks within the same step of a plan not immediately run once one of the previous tasks completes? (but instead, put a 4 minute wait before turning up our next 6)

 

 

I should add that we are using a Windows Server 2016 running the Talend Remote Engine 2.5.0 Service. The service itself is running, it's just that the when there are more tasks within a step than max concurrent tasks, we end up waiting 4 minutes for the next batch of tasks within the same step to be picked up.

6 Replies
mdaly
Contributor
Contributor
Author

Didn't mean to open separate reply - fwiw still occurring this morning, causing us to miss our SLAs. 

mdaly
Contributor
Contributor
Author

We upgraded to the TalendRemoteEngine 2.8.4 and the issue is still occurring.

mdaly
Contributor
Contributor
Author

We upgraded to the TalendRemoteEngine 2.8.4 and the issue is still occurring.

David_Beaty
Creator III
Creator III

Hi,

This isn't  a helpful comment, but just insight, I've noticed that when you manually trigger a job that is scheduled, and the manual invocation is still running when the scheduled one starts, it will wait for 4 minutes before attempting to try again.

 

So, say I manually trigger a job that runs for 2 minutes at 15:59 (so runs 15:59-16:01) and the scheduled instance starts at 16:00....it will sit in the same waiting state until 16:04 before trying to start the job again. Which essentially sounds like the behaviour you are seeing. when one job in the plan finishes and goes to start the next, its as if the Remote Engine thinks its busy so makes it wait 4 mins. I'd imagine this is a setting up in the TMC that you dont have any control over.

 

David_Beaty
Creator III
Creator III

Hi,

This isn't  a helpful comment, but just insight, I've noticed that when you manually trigger a job that is scheduled, and the manual invocation is still running when the scheduled one starts, it will wait for 4 minutes before attempting to try again.

 

So, say I manually trigger a job that runs for 2 minutes at 15:59 (so runs 15:59-16:01) and the scheduled instance starts at 16:00....it will sit in the same waiting state until 16:04 before trying to start the job again. Which essentially sounds like the behaviour you are seeing. when one job in the plan finishes and goes to start the next, its as if the Remote Engine thinks its busy so makes it wait 4 mins. I'd imagine this is a setting up in the TMC that you dont have any control over.

 

NagarjunaYalapalli
Contributor
Contributor

Hi All ,

 

Did anyone come across any solution for this problem?

 

Thanks,

Naga