Qlik Community

Qlik Support Updates Blog

Important and useful support information about end-of-product support, new service releases, and general support topics.

Sonja_Bauernfeind
Not applicable

Troubleshooting a QlikView Server: Performance and crashes

Image Not found

When the QlikView AccessPoint starts showing No Server, or end users are reporting that they are being kicked out of their application while they’re working in them, it’s often the QlikView server (our QIX engine) that’s to blame.

It might have crashed. Or might have just run out of resources, ramping up RAM and CPU usage until the entire system eventually crashes.

So, what do we do in a situation like this?

First, we figure out if something is wrong, or if we are looking at the QlikView server having outgrown the currently available resource. Since like anything else, as usage grows, demand grows and what we currently have available just isn’t enough anymore. A bit like a plant outgrowing its pot. 

This blog post is meant to outline the troubleshooting steps to identify just that, but before we can get started we need to look at how QlikView (or our QIX engine in general) uses resources.

We have material available for extended reading here: QlikView Memory Management (including an excellent video summary that I really need to recommend) and here: Qlik Engine Memory Management.

The TLDR is:

  • Increased memory usage / RAM usage up to the configured Working Set Limit is expected. It should be a gradual, and over time, increase.
  • An immediate problem can be identified if there is a sudden spike to the High Working Set Limit or beyond.  
  • If the QlikView Server Service is hosted in a virtual environment, resources need to be dedicated. 

But how do we identify an actual problem vs an expected behaviour?

Here’s how we usually do this at support.

 

What is using the resources? 

I know, I already blamed the engine from the get-go, but we do need to make sure we aren’t pointing fingers at the wrong culprit. So, if the host operating system is running out of resources, we first want to make 100% sure that it’s the qvs.exe that’s at fault. This can be determined either by monitoring the Windows Task Manager\Processes tab directly while the problem occurs or maybe it was previously identified by a resource monitoring tool, such as Windows Performance Monitor.

If it turns out to be a qvb.exe (if the Distribution Service is on the same machine), then this got a little easier, since then it’s a reload that’s causing the problem. Troubleshooting this is, sadly, not covered in this post though. Maybe another time?

 

Identify if:

Memory usage increases gradually over time and stays stable at or around the configured Low Working Set Limit.

This is expected behaviour.

We can confirm what the Working Set is configured as in: QlikView Management Console > System > Setup > QlikView Servers > QVS@SERVER > Performance

So, it'll look a bit like:

graph yes okay.png

 

Memory usage increases gradually over time, is stable at or around the Low Working Set Limit or at the High Limit. Users are experiencing a negative performance impact.

This may indicate that the current setup needs to be reevaluated and that more resources need to be made available, or an additional QlikView Server node needs to be added to the environment. The below steps may still be applied to find possible problem documents or objects that could be optimized.

May also look like the graph above.

Memory usage increases suddenly and leads to performance impact or QlikView Server Service crashes

Boom. Unexpected behaviour.

Often looks somewhat like this:

graph yes not okay.png

In this example 1 document took up the majority of available RAM, while another was loaded in after, tipping the QVS.exe to 100% memory usage.

 

Identifying the problem and root cause

Next, we need to start analyzing a few specific log files, and for that, we need to first identify 3 things:

  • when did the resource allocation problem start,
  • when it came to its peak,
  • and what actions were being carried out against the QlikView Server engine at that time.

The When can be identified post mortem (after the fact) by looking at when issues were reported, and hopefully by catching errors and warnings logged in the QlikView Server Event logs.  But since we want to make sure we are prepared for the next time this happens, we usually recommend setting up Resource Monitoring.

 

Capture Resource Usage (Windows)

Configuring resource monitoring or performance monitoring in Windows to get an overview of the situation is always recommended. See How to set up performance monitoring for QlikView Server Service(QVS) (perfmon) for a simple example.
 

Review native QlikView log files

This is where we roll our sleeves up and start digging.

The QlikView Server Service has four log files that are crucial for identifying possible issues, and I will touch briefly on all of them. For more details on logs, check out this “How to collect QlikView log files” article.  

Some basics:

  • Default storage: C:\ProgramData\QlikTech\QlikViewServer (you might have changed that, check the Management Console)
  • Configurable in the QlikView Management Console > System > Setup > QlikView Servers > QVS@SERVER > Logging 

 

Events_SERVER.log

This includes engine activity. How much we will be able to read depends on log verbosity. 

  • It will log, for example, WorkingSet warnings. See What does "Warning WorkingSet: Virtual Memory is . . . ." mean? This is what we’d expect if the qvs.exe itself is overloading the machine.
  • It can give details on what documents are being loaded, what documents are being uploaded from the Distribution Service, or related actions to document activities. 
  • It will log exceptions and crashes
  • For best results in finding the root cause of resource issues, set the Log Verbosity to High and split 


Performance_SERVER.log

Includes performance information for the QlikView Server Service only. No other services or components will be logged. 

  • Only logs every 5 minutes by default.
  • Logs memory statistics in the following rows:  
VMCommitted(MB)    VMAllocated(MB)    VMFree(MB)    VMLargestFreeBlock(MB)

I like using this one to pinpoint crashes easily, as it will show when the service starts up. And a glance at the memory statistics can already help identifying how quickly we consumed it all. Generally, we like throwing this into a QlikView or Sense App and looking at pretty graphs.


Audit_SERVER.log

Logs user actions, such as the opening of documents, opening of sheets, bookmark selections, exports, etc. 

C:/ProgramData/QlikTech/Documents/movies database.qvw    SendToExcel    DOMAIN\Administrator    Sheet Object    Document\LB04

This is what we need when we suspect user actions to be responsible for the behaviour. Like someone attempting to export a table that pulls out every last bit of data from the document, or a user created objects with an expression that causes an exception in the engine and crashes it.


Session_SERVER.log

Records server wide closed sessions.  Sessions closed due to QlikView Server Service restarts should also be logged, if sessions are unaccounted for, this needs to be noted too, as it will indicate a service crash.
 

QIX_performance.log

New! Starting from QlikView November 2018/version 12.30, you now have the possibility to capture granular usage metrics from the Qlik in-memory engine based on configurable thresholds.  This provides the ability to capture CPU and RAM utilization of individual chart objects, CPU and RAM utilization of reload tasks, and more.

This log is by default not enabled so please follow the instruction provided here to enable it. 

! Be very careful when enabling this, as it can generate a lot of logging information very quickly.

Examples:

Here are some (list to be updated in the companion article) problems that can be identified through these log files.

What does "Warning WorkingSet: Virtual Memory is . . . ." mean?
Unable To Export Large Objects To Excel After Upgrading to November 2017
AAALR greater than row applicator messages
AccessPoint slow to load due to too many files or folders mounted 

 

What do we do with the data that we find or what are we looking for?

We are looking for:

  • What documents are being loaded and by what users?
  • Are any documents being uploaded to the server by the QDS at the same time? During peak hours, this can lead to stability issues if the system is already heavily loaded.
  • What actions are being carried out by the users just before the crash or sudden peak in memory usage? If you want more information on how to trace a user through the entire system, this article might be helpful.

For example, we might see:

Information  Server: Document Load: Beginning open of document
Information  System: Document Load - ODE1: Document \\path\\doc.QVW, AuthenLev(1). Authuser()
Information  DOC loading: Beginning load of document \\path\\doc.QVW.
Warning       WorkingSet: Virtual Memory is growing beyond parameters - 4.308(4.200) GB
Warning       WorkingSet: Virtual Memory is growing beyond parameters - 4.688(4.200) GB
Warning       WorkingSet: Virtual Memory is growing beyond parameters - 4.711(4.200) GB 

There were no memory alerts prior to the load of doc.QVW, so we can start with this one.

Depending on our findings, we may then move on to:

A review of active QlikView documents

Documents active during the day and while the problem is observed, can be individually reviewed for their basic memory footprint. 

  • An example would be to open the documents individually in the QlikView Desktop client. 
  • If an object or sheet was already identified by using the log files above, this can be reviewed directly.
  • It is also possible to get an overview of calculation times and memory usage of individual objects:
    • Open the document in QlikView Desktop and go to Settings > Document Properties...
    • Go to the Sheets tab and in the list of objects, review the Calc Time and Memory data for each object.

If a Document or Object was identified:

  • Carry out optimization with the assistance of the original developer of the document.
  • If a shared object/user object was identified, the object can be deleted in the QlikView Management Console. See 
    QlikView: Manage User Objects and Shared Objects.

We do have some guidelines though that can be applied to make sure the server is configured for the best possible stability and performance. 

For general optimization of the server: 

  • Is QlikView deployed on a virtual or physical machine?
  • Configure the Server BIOS optimally:
  • Are other services hosted on the same machine? The QVS.exe does not like sharing.
    • If the QlikView Server Service shares a host with the QlikView Distribution Service, the Distribution Service could potentially be taking resources from the qvs.exe. A separate machine for QlikView Distribution Service may be necessary.
  • Adjust when documents are released from memory:
    • The default value is set to 8 hours. Configure this in the QlikView Management Console > System > Setup > QlikView Servers > QVS@servername > Documents > Document Timeout value.
  • Configure the QlikView engine to clear cached data:
  • Is the Page File on the machine configured correctly?
    • A too small page file may result in performance issues, even if the QlikView Server memory management seems to be working as expected. Allow Windows to choose Page File size.

And that is it, for the most part. It’s a guide that can be built on further for any kind of troubleshooting of the QlikView Server service, and I hope that it was informative  Smiley Very Happy

I can only highly recommend taking findings from your investigations and going through our evergrowing public KB with the symptoms and error message you might have located. 

If you have suggestions or questions, please drop them in here. We’re excited to hear from you.

 

 

1 Comment
Or
Not applicable

I'll chime in with one more tip that I've found very useful over the years. If your QlikView server has suddenly gone off the deep end for resource use and you haven't made any significant changes to your deployment, check for new user / shared objects. It's quite easy for a user to create a completely nonsensical visualization - for example, a 100,000 x 100,000 pivot table - that drives the server to 100% resources for minutes at a time. Unfortunately, there doesn't seem to be any way to prevent this either, but if you can locate the offending object, you can at least delete it and work with the user in question to create a more reasonable object for their use case.

222 Views