Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

QV 11.2 SR2 QDS Issues

Hi,

We recently upgraded from QV10 SR4 to QV11.2 SR2 and have noticed a lot of odd behaviour in the new QDS environment with regards to task scheduling, task execution, and task result status reporting in the QMC.  We have raised cases with QT support, but I just thought I'd see if anyone else was seeing similar issues.

The main oddities we have noticed:

  1. Tasks not firing - I have observed some tasks which have simply not been executed at their scheduled time.  (E.g. could be a daily schedule that works fine for a week and then just misses a day).
  2. Tasks not reporting error status back to QMC correctly. E.g. task reaches timeout and then fails, but does not show up as failed in the QMC.  This makes it extremely difficult to manage the production environment.
  3. Intermittent and seemingly random COM Exception Errors.  We are seeing about 10 of these per day.  This problem is coupled with issue 2 above - i.e. the failed statuses are not always reported back.  Here is an example: QDSMain.Exceptions.DistributionFailedException: Distribute failed with errors to follow. ---> QDSMain.Exceptions.ReloadFailedException: Reload failed ---> QDSMain.Exceptions.LogBucketErrorException: The sourcedocument failed to reload.. Exception=System.Runtime.InteropServices.COMException (0x800706BE): The remote procedure call failed. (Exception from HRESULT: 0x800706BE)

We had exactly the same configuration on QV10 in terms of task schedules and dependencies (i.e. the QV10 documents and tasks were migrated across "as is"), and we did not see any of these issues on the QV10 SR4 QDS.

The new environment is clustered (although we are only running one hot QDS node currently), and the servers are 40 core / 256GB RAM. The max number of QDS engines has been set to 40, and the heap size has been increased as per QV Support's recommendations based on the hardware configuration.  The server doesn't appear to be resource bound.  It would be great to know if anyone else is experiencing any strange behaviour with QV11.2 SR2.

Any feedback would great - even if it's just a "we're seeing no problems on version X". 

Thanks,

Graeme

1 Solution

Accepted Solutions
Not applicable
Author

For anyone else experiencing similar types of symptoms:

Issues 1 and 2 from my original post appear to have been getting caused by a combination of two issues -

  1. what look to be like bugs in the QDS clustering logic causing tasks to be "lost" and statuses to not be reported back when one of the QDS nodes in the cluster is on a warm standby (i.e. the server is a member of the cluster, but the QDS service is stopped).  QT are investigating, but since we removed the standby node from the cluster, the issues appear to have calmed down.
  2. QMS chunk size issue as outlined above.

Issue 3 has been acknowledged as a bug and is currently under investigation by QT Support.

It looks like there may be a potentially related bug fixed in 11.2 SR4, but I don't have enough information yet to say if it will fix the problems we were seeing or not:

61946 Removing QDS from Cluster - Tasks Still Load Balance to Removed Node

Thanks for your input Karthikeyan S

View solution in original post

10 Replies
jochem_zw
Partner Ambassador
Partner Ambassador

can you try to recreate a task, and see if this is running the correct way, for a period of time??

It seems you have a big environment with problably a lot of tasks.

Don't think you are waiting for this solution, but maybe you can tackle this way your problem.

Better to wait for a answer from Qlikview so they can tackle this problem.


Not applicable
Author

Hi Jochem,

All of the tasks were created in this environment via the QMS API (I wrote some code to migrate the tasks from our QV10 environment).  Interesting point though - we could try and create some tasks manually and see if they are impacted.  The difficult thing with these issues however is that they are intermittent, it's very difficult to tell if the new task has not been impacted because it is 'lucky', or if it is because creating it manually has actually fixed the problem.

I really hope that is not the cause, as I don't really like the idea of creating a 1000+ tasks by hand - I think it would probably create more problems than it fixes!

QV are investigating - we have a few cases open - but I just wanted to get some feedback from other users. 

Thanks for your input.

Regards,

Graeme

gainkarthi
Partner - Specialist
Partner - Specialist

Hi,

We were experiencing the same problem in QV11.2 SR2.

Reducing chunk size may fix the triggering issue.

Apart from that re bouncing the services will reduce this problem. I am doing services restart and mostly i am not experiencing the problem again.

Regards
Karthi

Not applicable
Author

Hi Karthikeyan,

When you say "chunk size", do you mean "heap size"?  Have you found that reducing the heap size fixed this issue?

We are not really in a position where we can restart the services on a regular basis.  We have a 24/7 operation, and this would impact long running tasks and have knock on affects for other processes, so this would not be an acceptable long term solution for us.  Useful to hear that this has been working for you.  Perhaps if the problems get worse we may need to do this as an interim solution. 

Thanks,

Graeme

Not applicable
Author

Hi Karthikeyan,

Support have confirmed the "1. Tasks not firing" issue as bug 48073.  I now what the "chunk size" is (it's a setting in the QVManagementService.exe.config in case anyone else is wondering).

I will try the suggested size and see if this addresses the issue.

Thanks,

Graeme

gainkarthi
Partner - Specialist
Partner - Specialist

Hi Graeme,

May these links help you understanding the problem more clear,

http://community.qlik.com/message/331010#331010

http://community.qlik.com/message/334883?tstart=0

Regards,

Karthi

Not applicable
Author

For anyone else experiencing similar types of symptoms:

Issues 1 and 2 from my original post appear to have been getting caused by a combination of two issues -

  1. what look to be like bugs in the QDS clustering logic causing tasks to be "lost" and statuses to not be reported back when one of the QDS nodes in the cluster is on a warm standby (i.e. the server is a member of the cluster, but the QDS service is stopped).  QT are investigating, but since we removed the standby node from the cluster, the issues appear to have calmed down.
  2. QMS chunk size issue as outlined above.

Issue 3 has been acknowledged as a bug and is currently under investigation by QT Support.

It looks like there may be a potentially related bug fixed in 11.2 SR4, but I don't have enough information yet to say if it will fix the problems we were seeing or not:

61946 Removing QDS from Cluster - Tasks Still Load Balance to Removed Node

Thanks for your input Karthikeyan S

Anonymous
Not applicable
Author

Hi Graeme,

Regarding the bug that QV confirmed to you (48073). Is this identified as a bug only in the clustered environment or otherwise as well.

I am running QV 11.2 SR2 version and i have found this issue multiple times in the recent.(not in a cluster)

So just wondering if this is a bug or is it something i am missing on.

Also, do you have any suggestions on how to trace this back to why the tasks are not getting triggered.

Thanks in advance for your help.

Regards,
Aadil.

Not applicable
Author

Hi Aaidil,

My understanding is that 48073 affects non clustered environments also, although I have not attempted to recreate the issue myself.

Regards,

Graeme