Skip to main content
Announcements
See why Qlik is a Leader in the 2024 Gartner® Magic Quadrant™ for Analytics & BI Platforms. Download Now
cancel
Showing results for 
Search instead for 
Did you mean: 
QGTFS
Contributor III
Contributor III

Reload Engine and Distribution service instability

Hello Qlik Community,
 
I ask for your help regarding an evasive issue, regarding the reload engine and distribution service instability, mostly regarding the distribution service.
 
The symptoms are the following, at seemingly random time of the day, we receive the following alert:
"QMC on machine XXXXX reports that one or more Qlikview Distribution Services failed to respond. Please check QMC for more info"
And sometimes, we also receive at the same time another alert:
"The service 'ReloadEngine@XXXXX ' is down. Service Url is http://XXXXX :4720/QDS/Service"
 
For the context:
We were using Qlikview 11.2 until recently, we upgraded to the version 12.9. Initial release, going to 12.7 first and then to 12.8. The problem appeared when we first upgraded to 12.7, and the situation changed with Qlikview 12.8 SR 2 ( 12.80.20200.0).
The first failed were correlated to an overload of the CPU during the reload of some reports. Since then, we've optimized the queries and overall it reduced the occurrence of the alert, although not completely avoiding it.
We also, when it first appeared, applied a daily restart of services, which helped but did not solve the problem. We later removed it with the latest version, as it seemed suboptimal in this new setup, which in turns reduced the frequency.
To get to this point, we followed the guides to help troubleshoot the problem: 
 
 
It seems that part of it is also related to the reload with Nprinting, triggering the issue.  
With the 12.8 SR2 version, one of the patch notes seemed to target a similar issue - QV-22417 - QVS crashed intermittently under heavy load and interaction:
"The QVS server crashed intermittently while evaluating QlikView chart object calculations. 
This was not caused by any specific calculation but was mainly connected to concurrency around shared chart objects, particularly sessions connecting/disconnecting and fast type changes of linked shared objects. 
Multiple additional synchronizations and safeguards have been introduced to safely handle sharing sessions attaching and detaching. 
Shared object with linked (replicated) object type changes have been restricted to safe combinations."
 
 
After the change from 12.8 IR to 12.8 SR2, the qlikview distribution service fails with the reload engine, changed a little in behaviour:
It now appears at seemingly random time, without regularity in the time of day, and also more often. Even though 
the specs of the server have been increased a little, because we also want to shift from using Qlikview Plugin to Ajax for the user to access the reports, and so to keep a margin of memory. 
The server is using Microsoft windows server 2019 Standard, the CPU is  2.30GHz, over 8 cores, with 72 Gb of memory.
This server is dedicated to Qlikview and currently still in test, meaning not accessed by end users, only admins.
We've also tried to cap CPU usage and uncap it (65 %, 95%, 100%) without changes.  Currently, it is capped at 95%.
We've analyzed the logs as well without finding clues or much trace.
Following the advice from this guide:
We've looked up the performance, although it only seems to point to the consequence, with a high CPU consumption after the service failure.
 
In 12.9, the issue still persists, even with little load on the server except Qlikview, Nprinting and me as the only user connected. 
 
Has anyone encountered a similar issue or has a clue of what would cause the distribution failure ? 
 
 

 

Labels (1)
1 Solution

Accepted Solutions
Chip_Matejowsky
Support
Support

Hi @QGTFS,

The QlikView Management Service (QMS) sends a check to the QlikView Distribution Service (QDS) every ten seconds to check if QDS is running. If this timeout is exceeded, the QMS  logs will return that specifi error. Refer to Qlik Support article QlikView Management Service Log Error - Error Failed to retrieve QDS info: System.ServiceModel.Commu... for steps on how to mitigate it.

Best Regards

Principal Technical Support Engineer with Qlik Support
Help users find answers! Don't forget to mark a solution that worked for you!

View solution in original post

10 Replies
Chip_Matejowsky
Support
Support

Hi @QGTFS,

First question is what type of architecture are you employing? Single server, where all QV services are installed on a single server? Split server, where QlikView Server service is on a dedicated server and QlikView Distribution Service (Publisher) is on another dedicated server? Cluster, where multiple instances of QlikView Server service/QlikView Distribution Service are installed on multiple dedicatd servers?

Can you provide the QlikView Management Service (QMS) logs for date of QDS disconneciton? Article How to collect QlikView Server logs has the details.

Best Regards

Principal Technical Support Engineer with Qlik Support
Help users find answers! Don't forget to mark a solution that worked for you!
QGTFS
Contributor III
Contributor III
Author

Hi @Chip_Matejowsky , thank you for your help !

We operate one instance of Qlikview on a unique server. 

For the logs, I've just enabled debug logging for Management services, the normal logging did not register the last QDS failures. I'll send them as soon as I have the new ones.

 

I also read that it might be caused by a Windows timeout being too short, as the service failures now last very little time, at most a few seconds, mostly less than a second(between the time of receiving the alert and witnessing it). Although I have not found yet how I could increase the timeout length of Windows.

QGTFS
Contributor III
Contributor III
Author

 

After activating the logs in debugging mode for QMS, there's nothing registered on QDS failure.

 

Chip_Matejowsky
Support
Support

Hi @QGTFS 
If the QlikView Distribution Servie (QDS) does disconnect, it will be recorded in the QlikView Management Service (QMS) log. So please monitor and when the issue occurs, check for QDS related errors. Please upload log here or provide any error messages. Also, please let us know if Publisher enabled?

Since you are running all QlikView services on one server, then all of the services must share resources. The QlikView Server service takes precedence as it creates memory reservations with the operating system as defined in the QMC > System > Setup > QlikView Servers > QVS@ > Performance tab > Working Set. The default values are 70% for Low and 90% for High. This leaves only 30% - 10% memory for all of the other QV services. Article Qlik Engine Memory Management provides more details. 

In single server environments, I have seen issues with QDS disconnections/performance issues when the QVS becomes very busy or overloaded. If you are using just Reload Engine, ensure that you are performing the reloads only when the QVS/AccessPoint isn't being heavily used, such as overnight. If using Publisher, consider installing the QDS on it's on dedicated server so that it doesn't have to share resources with the QVS. Or, another option would be to significantly increase system RAM and then decrease the Working Set values accordingly.

Best Regards

Principal Technical Support Engineer with Qlik Support
Help users find answers! Don't forget to mark a solution that worked for you!
QGTFS
Contributor III
Contributor III
Author

Hi @Chip_Matejowsky , 

Thank you very much for your answer ! I will investigate on the RAM side. 

We're are not using publisher. I'll keep monitoring the logs and update this post as soon as I have new information. 

QGTFS
Contributor III
Contributor III
Author

For the logs after a day, the same alert appeared 7 times, but only one seems to have triggered an alert (with no difference in logs): 

With no triggerring email alert:

20240605T204526.224+0100
Error Failed to retrieve QDS info: System.ServiceModel.CommunicationException: QDS did not respond to request.
Last exception (for http://SERVER:4720/QDS/Service😞 
The request channel timed out while waiting for a reply after 00:00:10. 
Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:10. 
Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.TimeoutException: The HTTP request to 'http://SERVER:4720/QDS/Service' has exceeded the allotted timeout of 00:00:10. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.Net.WebException: The operation has timed out 
||    at System.Net.HttpWebRequest.GetResponse() 
||    at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) 
||    --- End of inner exception stack trace --- 
||    at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason) 
||    at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) 
||    at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) 
||    --- End of inner exception stack trace --- 
||  
|| Server stack trace:  
||    at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) 
||    at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) 
||    at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout) 
||    at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) 
||    at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message) 
||  
|| Exception rethrown at [0]:  
||    at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) 
||    at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) 
||    at PIX.Services.IQDS.GetQDSInfo() 
||    at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) 
||    --- End of inner exception stack trace --- 
||    at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) 
||    at PIX.Services.ClientSupport.QDSClientImpl.GetQDSInfo() 
||    at QMSBackendCore.Communication.DistributionService.TimeProfileMethodCallGeneric[T](Func`1 method, String additionalInfo) 
||    at QMSBackendCore.Communication.DistributionService.GetQDSInfo(DistributionServiceResource qdsResource, Int32 GetQdsTimeOut)
 
The error registered just before an alert email was fired (there's no other log related in this time period):
20240605T234442.079+0100
Error Failed to retrieve QDS info: System.ServiceModel.CommunicationException: QDS did not respond to request.
Last exception (for http://SERVER:4720/QDS/Service😞 
The request channel timed out while waiting for a reply after 00:00:10. 
Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:10. 
Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.TimeoutException: The HTTP request to 'http://SERVER:4720/QDS/Service' has exceeded the allotted timeout of 00:00:10. 
The time allotted to this operation may have been a portion of a longer timeout. 
---> System.Net.WebException: The operation has timed out 
||    at System.Net.HttpWebRequest.GetResponse() 
||    at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) 
||    --- End of inner exception stack trace --- 
||    at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason) 
||    at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) 
||    at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) 
||    --- End of inner exception stack trace --- 
||  
|| Server stack trace:  
||    at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) 
||    at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) 
||    at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout) 
||    at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) 
||    at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message) 
||  
|| Exception rethrown at [0]:  
||    at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) 
||    at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) 
||    at PIX.Services.IQDS.GetQDSInfo() 
||    at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) 
||    --- End of inner exception stack trace --- 
||    at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) 
||    at PIX.Services.ClientSupport.QDSClientImpl.GetQDSInfo() 
||    at QMSBackendCore.Communication.DistributionService.TimeProfileMethodCallGeneric[T](Func`1 method, String additionalInfo) 
||    at QMSBackendCore.Communication.DistributionService.GetQDSInfo(DistributionServiceResource qdsResource, Int32 GetQdsTimeOut)
Chip_Matejowsky
Support
Support

Hi @QGTFS,

The QlikView Management Service (QMS) sends a check to the QlikView Distribution Service (QDS) every ten seconds to check if QDS is running. If this timeout is exceeded, the QMS  logs will return that specifi error. Refer to Qlik Support article QlikView Management Service Log Error - Error Failed to retrieve QDS info: System.ServiceModel.Commu... for steps on how to mitigate it.

Best Regards

Principal Technical Support Engineer with Qlik Support
Help users find answers! Don't forget to mark a solution that worked for you!
QGTFS
Contributor III
Contributor III
Author

Hi @Chip_Matejowsky , 

Thank you very much for your help !  I'm testing it to see if it solves the issues, I'll update the post as soon as I confirm it works.

Kind regards

QGTFS
Contributor III
Contributor III
Author

Hi @Chip_Matejowsky ,

 

After following the article and updating the file "Program Files\QlikView\Management Service\QVManagementService.exe.config" while the QMS and QDS were stopped :

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <connectionStrings>
    <add name="Default" connectionString="Data Source=C:\ProgramData\QlikTech\ManagementService\DocAdmins.db;Version=3;" providerName="System.Data.SqlClient" />
  </connectionStrings>
  <appSettings>
    <!-- ****** General ****** -->
 
<!-- GetQdsInfoTimeOutInMs  - This parameter is to prevent alert of timeout-->
    <add key=" GetQdsInfoTimeOutInMs " value="20000"/> 
 
    <!-- Defaults to %PROGRAMDATA%\QlikTech\ManagementService -->
    <add key="ApplicationDataFolder" value="" />
 
I have also tried at the end, above the  </appSettings>
 
Yet the timeout value doesn't seem to change as the error still appears, with a duration indicated of 10 sec: 
Error Failed to retrieve QDS info: System.ServiceModel.CommunicationException: QDS did not respond to request. Last exception (for http://SERVER:4720/QDS/Service😞 The request channel timed out while waiting for a reply after 00:00:10. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:10. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://SERVER:4720/QDS/Service' has exceeded the allotted timeout of 00:00:10. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.WebException: The operation has timed out || at System.Net.HttpWebRequest.GetResponse() || at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) || --- End of inner exception stack trace --- || at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason) || at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) || at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) || --- End of inner exception stack trace --- || || Server stack trace: || at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout) || at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) || at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout) || at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) || at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message) || || Exception rethrown at [0]: || at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) || at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) || at PIX.Services.IQDS.GetQDSInfo() || at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) || --- End of inner exception stack trace --- || at PIX.Services.ClientSupport.ClusterBase`1.Invoke[TR](CallType callType, Func`2 func, List`1 allResults, QlikMethodBehavior methodBehavior) || at PIX.Services.ClientSupport.QDSClientImpl.GetQDSInfo() || at QMSBackendCore.Communication.DistributionService.TimeProfileMethodCallGeneric[T](Func`1 method, String additionalInfo) || at QMSBackendCore.Communication.DistributionService.GetQDSInfo(DistributionServiceResource qdsResource, Int32 GetQdsTimeOut)
 
 
Can you help me spot where did I made a mistake or if I forgot something ?
 
Kind regards