Troubleshooting a QlikView Server: Performance and crashes
When the QlikView AccessPoint starts showing No Server, or end users are reporting that they are being kicked out of their application while they’re working in them, it’s often the QlikView server (our QIX engine) that’s to blame.
It might have crashed. Or might have just run out of resources, ramping up RAM and CPU usage until the entire system eventually crashes.
So, what do we do in a situation like this?
First, we figure out if something is wrong, or if we are looking at the QlikView server having outgrown the currently available resource. Since like anything else, as usage grows, demand grows and what we currently have available just isn’t enough anymore. A bit like a plant outgrowing its pot.
This blog post is meant to outline the troubleshooting steps to identify just that, but before we can get started we need to look at how QlikView (or our QIX engine in general) uses resources.
Increased memory usage / RAM usage up to the configured Working Set Limit is expected. It should be a gradual, and over time, increase.
An immediate problem can be identified if there is a sudden spike to the High Working Set Limit or beyond.
If the QlikView Server Service is hosted in a virtual environment, resources need to be dedicated.
But how do we identify an actual problem vs an expected behaviour?
Here’s how we usually do this at support.
What is using the resources?
I know, I already blamed the engine from the get-go, but we do need to make sure we aren’t pointing fingers at the wrong culprit. So, if the host operating system is running out of resources, we first want to make 100% sure that it’s the qvs.exe that’s at fault. This can be determined either by monitoring the Windows Task Manager\Processes tab directly while the problem occurs or maybe it was previously identified by a resource monitoring tool, such as Windows Performance Monitor.
If it turns out to be a qvb.exe (if the Distribution Service is on the same machine), then this got a little easier, since then it’s a reload that’s causing the problem. Troubleshooting this is, sadly, not covered in this post though. Maybe another time?
Memory usage increases gradually over time and stays stable at or around the configured Low Working Set Limit.
This is expected behaviour.
We can confirm what the Working Set is configured as in: QlikView Management Console > System > Setup > QlikView Servers > QVS@SERVER > Performance
So, it'll look a bit like:
Memory usage increases gradually over time, is stable at or around the Low Working Set Limit or at the High Limit. Users are experiencing a negative performance impact.
This may indicate that the current setup needs to be reevaluated and that more resources need to be made available, or an additional QlikView Server node needs to be added to the environment. The below steps may still be applied to find possible problem documents or objects that could be optimized.
May also look like the graph above.
Memory usage increases suddenly and leads to performance impact or QlikView Server Service crashes
Boom. Unexpected behaviour.
Often looks somewhat like this:
In this example 1 document took up the majority of available RAM, while another was loaded in after, tipping the QVS.exe to 100% memory usage.
Identifying the problem and root cause
Next, we need to start analyzing a few specific log files, and for that, we need to first identify 3 things:
when did the resource allocation problem start,
when it came to its peak,
and what actions were being carried out against the QlikView Server engine at that time.
The When can be identified post mortem (after the fact) by looking at when issues were reported, and hopefully by catching errors and warnings logged in the QlikView Server Event logs. But since we want to make sure we are prepared for the next time this happens, we usually recommend setting up Resource Monitoring.
This is where we roll our sleeves up and start digging.
The QlikView Server Service has four log files that are crucial for identifying possible issues, and I will touch briefly on all of them. For more details on logs, check out this “How to collect QlikView log files” article.
Default storage: C:\ProgramData\QlikTech\QlikViewServer (you might have changed that, check the Management Console)
Configurable in the QlikView Management Console > System > Setup > QlikView Servers > QVS@SERVER > Logging
This includes engine activity. How much we will be able to read depends on log verbosity.
I like using this one to pinpoint crashes easily, as it will show when the service starts up. And a glance at the memory statistics can already help identifying how quickly we consumed it all. Generally, we like throwing this into a QlikView or Sense App and looking at pretty graphs.
Logs user actions, such as the opening of documents, opening of sheets, bookmark selections, exports, etc.
This is what we need when we suspect user actions to be responsible for the behaviour. Like someone attempting to export a table that pulls out every last bit of data from the document, or a user created objects with an expression that causes an exception in the engine and crashes it.
Records server wide closed sessions. Sessions closed due to QlikView Server Service restarts should also be logged, if sessions are unaccounted for, this needs to be noted too, as it will indicate a service crash.
New! Starting from QlikView November 2018/version 12.30, you now have the possibility to capture granular usage metrics from the Qlik in-memory engine based on configurable thresholds. This provides the ability to capture CPU and RAM utilization of individual chart objects, CPU and RAM utilization of reload tasks, and more.
This log is by default not enabled so please follow the instruction provided here to enable it.
! Be very careful when enabling this, as it can generate a lot of logging information very quickly.
Here are some (list to be updated in the companion article) problems that can be identified through these log files.
What do we do with the data that we find or what are we looking for?
We are looking for:
What documents are being loaded and by what users?
Are any documents being uploaded to the server by the QDS at the same time? During peak hours, this can lead to stability issues if the system is already heavily loaded.
What actions are being carried out by the users just before the crash or sudden peak in memory usage? If you want more information on how to trace a user through the entire system, this article might be helpful.
For example, we might see:
Information Server: Document Load: Beginning open of document
Information System: Document Load - ODE1: Document \\path\\doc.QVW, AuthenLev(1). Authuser()
Information DOC loading: Beginning load of document \\path\\doc.QVW.
Warning WorkingSet: Virtual Memory is growing beyond parameters - 4.308(4.200) GB
Warning WorkingSet: Virtual Memory is growing beyond parameters - 4.688(4.200) GB
Warning WorkingSet: Virtual Memory is growing beyond parameters - 4.711(4.200) GB
There were no memory alerts prior to the load of doc.QVW, so we can start with this one.
Depending on our findings, we may then move on to:
A review of active QlikView documents
Documents active during the day and while the problem is observed, can be individually reviewed for their basic memory footprint.
An example would be to open the documents individually in the QlikView Desktop client.
If an object or sheet was already identified by using the log files above, this can be reviewed directly.
It is also possible to get an overview of calculation times and memory usage of individual objects:
Open the document in QlikView Desktop and go to Settings > Document Properties...
Go to the Sheets tab and in the list of objects, review the Calc Time and Memory data for each object.
If a Document or Object was identified:
Carry out optimization with the assistance of the original developer of the document.
Are other services hosted on the same machine? The QVS.exe does not like sharing.
If the QlikView Server Service shares a host with the QlikView Distribution Service, the Distribution Service could potentially be taking resources from the qvs.exe. A separate machine for QlikView Distribution Service may be necessary.
Adjust when documents are released from memory:
The default value is set to 8 hours. Configure this in the QlikView Management Console > System > Setup > QlikView Servers > QVS@servername > Documents > Document Timeout value.
Configure the QlikView engine to clear cached data:
This is only recommended as an interim step while other actions are taken to either scale the system correctly, or optimizations of QlikView documents have been completed.