Solved: Re: Task failed due to timeout getting engine conn... - Qlik Community

JustinDallas · ‎2018-07-19

Hello Everyone,

So at the beginning of this week, I've been getting lots of failed tasks. The error always says this.

2018-07-19 13:30:16 UTC
Max retries reached (0)

2018-07-19 13:30:16 UTC

Changing task state from Queued to FinishedFail

2018-07-19 13:30:16 UTC

Message from ReloadProvider: Task failed due to timeout getting engine connection

2018-07-19 13:00:16 UTC

Changing task state from Triggered to Queued

2018-07-19 13:00:15 UTC

Trying to start task. Sending task to slave scheduler qliksense.dts

2018-07-19 13:00:14 UTC

Changing task state to Triggered

How do I go about diagnosing what the root problem is? I've checked the logs (mostly in the Engine directory) and I don't see anything telling me what the problem could be.

Many of the failing scripts are end user Apps, so they don't reach out to the DB for their data but instead read their data from QVD files.

Since it looks like the reload process is waiting for 30 minutes to get a connection, I'll go see if anything has had a sharp increase in load time over the past week.

Any help on getting to the bottom of this would be greatly appreciated.

JustinDallas · ‎2018-07-19

Possible Diagnosis and Solution:

So I think I've figured out what's happening. It all centers around when I decreased the number of concurrent tasks that can happen at a given time from 4 -> 3 with a Task Timeout of 30 minutes. Here is what I think happens.

- 00:00 Task A starts and completes successfully

- 00:05 Task A's completion triggers tasks [ SLOWTASK-2, SLOWTASK-3, NOTSLOW-4...NOTSLOW-12]

- 00:10 ST-2 Still Running

ST-3 Still Running

NST-4 Runs & Completes

NST-5 Runs & Completes

NST-6..12 Queued

- 00:25 ST-2 Still Running

ST-3 Still Running

NST-6 Runs & Completes

NST-7 Runs & Completes

NST-8..12 Queued

- 00:35 ST-2 Still Running <--- Times Up!

ST-3 Still Running

NST-8 Runs & Completes

NST-9 Runs & Completes

NST-10..12 Queued <-- Incomplete and never started tasks

Since the time that NOTSLOW-10 thru NOTSLOW-12 have been queued exceeds the Task Timeout limit, they get listed as failed. The task didn't fail the usual way we are accustomed to which are usually script error, file contention (someone is writing the file while a task wants to read it), or something is wrong with the datasource (db permissions, file path doesn't exist). The failure is literally that nothing happened, it got queued up, but never got to take the stage, or contact the engine as they say.

I'm going to increase my concurrent tasks limits and see if that provides any relief.

View solution in original post

JustinDallas · ‎2018-07-19

Possible Diagnosis and Solution:

So I think I've figured out what's happening. It all centers around when I decreased the number of concurrent tasks that can happen at a given time from 4 -> 3 with a Task Timeout of 30 minutes. Here is what I think happens.

- 00:00 Task A starts and completes successfully

- 00:05 Task A's completion triggers tasks [ SLOWTASK-2, SLOWTASK-3, NOTSLOW-4...NOTSLOW-12]

- 00:10 ST-2 Still Running

ST-3 Still Running

NST-4 Runs & Completes

NST-5 Runs & Completes

NST-6..12 Queued

- 00:25 ST-2 Still Running

ST-3 Still Running

NST-6 Runs & Completes

NST-7 Runs & Completes

NST-8..12 Queued

- 00:35 ST-2 Still Running <--- Times Up!

ST-3 Still Running

NST-8 Runs & Completes

NST-9 Runs & Completes

NST-10..12 Queued <-- Incomplete and never started tasks

Since the time that NOTSLOW-10 thru NOTSLOW-12 have been queued exceeds the Task Timeout limit, they get listed as failed. The task didn't fail the usual way we are accustomed to which are usually script error, file contention (someone is writing the file while a task wants to read it), or something is wrong with the datasource (db permissions, file path doesn't exist). The failure is literally that nothing happened, it got queued up, but never got to take the stage, or contact the engine as they say.

I'm going to increase my concurrent tasks limits and see if that provides any relief.

michaelfreitas · ‎2020-08-25

I solved it here. Changing EngineTimeout(minutes) from 30 to 120.