Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

QV QDS 11.2 SR12 Distribution Service Memory Leak

Hi,

We upgraded to QV11.2 SR12 from 11.2 SR2 a few months ago. After the upgrade, we have seen what appears to be a memory leak in the QDS process.  After the service starts, everything works perfectly for about 3 days. After about 3-4 days however, we notice large backup of tasks, and eventually the service becomes completely unresponsive. 

After investigating, I noticed that since the upgrade, the Private Bytes RAM of the “QVDistributionService.exe” process grows after a service restart to about 40-50GB of RAM over the course of a few days, at which point the tasks begin to see huge time delays in allocating engines to the tasks, and eventually the tasks back up due to this and the service eventually becomes completely unresponsive.

Here is an example QDS task log…


After the RAM of the QVDistributionService.exe process reaches 40GB+ (note the 12 MINUTE delay allocating an engine):

11/01/2016 13:00:12.6676566 Information Opening "C:\QlikviewData\Environments\Production\Some.qvw"
11/01/2016 13:12:20.9927573 Information Allocating new QlikView Engine. Current usage count=14 of 40 (of type non-reader).

After the service is restarted… (and this is how it behaves in the 3 days leading up to the Private Bytes hitting 40GB)

Before the RAM of the QVDistributionService.exe process reaches 40GB:

11/01/2016 13:40:49.9356345 Information Opening "C:\QlikviewData\Environments\Production\Some.qvw"
11/01/2016 13:40:49.9366346 Information Allocating new QlikView Engine. Current usage count=2 of 40 (of type non-reader).


When the issue occurs, there is still some 200GB+ of physical RAM available on this server, so it is not bound by physical RAM – it’s just as if the process chokes under the weight of the memory leak.

I have seen this other case regarding the file watcher bug in .net FW, but we are on a later build of the .net FW, so I don’t believe this is related.

Publisher Distribution Tasks Extremely Slow

I have had a ticket open with support since December, but just wanted to see if anyone else is experiencing similar issues, or has any ideas or suggestions.  Our QDS reloads the documents in place (it does not distribute them - it's just a simple Reload from the first tab of the task configuration), and the QVS which is on another server is mounted to load documents from the same location (which is an SSD drive located in the QDS server). 



We have tried various things suggested by support (clearing task execution history files, gathering memory statistics from the server using a 3rd party debugging tool).  The latest suggestion I have from QT support is that our QDS/QVS configuration is the cause (i.e. QVS and QDS using the same folder).  This strikes me as odd as we have been running this configuration since 2011 on earlier releases of QV10 and QV11 without any issues at all, so I am a little bit surprised as to why it should stop working now, especially since I can see nothing in the release notes or the documentation which prohibits having a QDS source folder and QVS mount point being the same folder.  As a temporary measure, we have had to implement nightly restarts of the QDS, which keeps the symptoms of the issue at hand, but is not acceptable for us longer term.


Just wondering if anyone has any thoughts, ideas, experiences or suggestions?

Thanks and Regards,

Graeme

1 Solution

Accepted Solutions
Not applicable
Author

After many, many painful months, it turns out that this was due to a bug in the thread management of the Task Performance Summary data capture logic introduced in SR7. Every time a task is run with this setting enabled, a new thread is created but never killed.  As we run about 10k tasks per day, this kills our server fairly quickly.

If you enable / disable the QVBProcessSummary setting, you can turn the memory leak on and off.

The setting can be changed in the following file on the QDS server.  See the release notes for SR7 for full details though. 


C:\Windows\System32\config\systemprofile\AppData\Roaming\QlikTech\QlikViewBatch\settings.ini


EnableQVBProcessSummary=0

OR

EnableQVBProcessSummary=1


QT have provided a fix that we will test shortly, and which I presume will come in a later SR.  If you are experiencing this problem though, you can avoid it by disabling the QVBProcessSummary setting.

View solution in original post

12 Replies
Not applicable
Author

Smith, we had a same issue with SR12 patch version and our ticket still open with support. We also had a scheduler task to restart the server over night.

I would suggest, separate the source documents & user documents into separate folder as best practice.

Peter_Cammaert
Partner - Champion III
Partner - Champion III

At least, the separation of source and user documents (even if only temporarily) is worth a try to see if this fixes your issue. If SR13 would have had a fix, I'm sure Qlik support would have mentioned this already.

Peter

Not applicable
Author

Hi Peter,

If we separate the user documents and source documents folders (a setup which has been working fine for 5+ years without problem), we would also need to reconfigure hundreds of tasks to use a "distribute" instead of a "reload", and I believe this would also mean redefining all of our AD group permissions in the QMC distribute actions (they are currently applied to the files as they are deployed via an in house deployment tool).  So I'm not convinced separating them would be a small job, and therefore not one I'm really keen to embark upon unless I have some hard evidence that it might actually fix the issue!  I am worried that making this change will potentially introduce more issues, and that is without a guarantee that it will fix anything.

Regards,

Graeme

Not applicable
Author

Hi Dathu,

Interesting that you are seeing the same behaviour.  Do you use "reload" tasks or "distribute" to the QVS? 

The number of threads we see on the QVDistributionService.exe process is incredibly large (into the thousands), and the Private Bytes RAM just grow gradually after the service is restarted, until it seems to choke under it's own weight at about 40-50GB.

Why do you suggest separating source and user document folders as a best practice?  We have the QDS and QVS on separate servers, so if we use a "distribute" instead of a reload, the QDS will presumably need to "distribute" the file by writing it via the network path which is configured for the QVS cluster, instead of being able to write locally to the local SSD drive hosted on the QDS (which is significantly faster)?   We don't have any issues with the QVS, so I'm not really sure why "reload" vs "distribute" would make a difference, other than to slow down the writing of the reloaded document to disk (i.e. doing it via the network instead of writing it to the local disk directly).

Thanks

Graeme

Not applicable
Author

We setup every task Reload & distribute to SAN folder which QVS service can access.

I would say below points, why we need to separate Source Documents & User Documents :

1. Source Documents strictly available to Developers only. If source documents & user documents are same, users who have NAMED CAL can access the qvw file with desktop client and edit the qvw file.

2. If user open dashboard on access point, .shared & .meta files were created. It may cause some confusion if we kept both on same folder.

3. If we use distribute, the script will be voided so no one from users can't any protected DB details on the script.

Not applicable
Author

Thanks Dathu,

I don't think any of these points really apply to our deployment.

1. Our developers can only deploy applications via a deployment tool we have.  Developers publish first to development, QA, UAT, and other test environments, but only the release managers have permissions to promote applications to the production environment (which is handled by the deployment tool).  The developers do not have access to the production environment.

2. As per the previous point, the only thing accessing the production environment folder are the QDS and QVS processes, so there is no potential for confusion regarding the Meta and shared files.

3. Users only access via the IE Plugin or Browser client via the access point.  Our documents have section access so cannot be distributed safely to end users (the contents are not encrypted), so the only way to access them is via the access point.

Thanks for your thoughts though. 

Best Regards,

Graeme

Not applicable
Author

Generally Access point folder (User Documents) can access by all users how can have CAL. So users can access the folder and break the qvw.

Not applicable
Author

Hi Dathu,

We are using active directory / NTFS, so users most certainly can not access the entire user documents folders on our deployment. Users can only access files to which they (or a group they are a member of in AD) have explicitly been granted access to, and to which they have the appropriate section access rights.

Regards,

Graeme

Not applicable
Author

QT Support eventually accepted there is an issue, and it has been escalated internally at QT.  Developers are investigating.  Will post when I have more info (in case anyone else is experiencing the same issue).  The QDS process just keeps on growing it's Private Bytes RAM and also the number of threads until the process just becomes unresponsive (this normally happens for us when the Private Bytes gets to about 50GB).