Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
Croisfelt
Contributor
Contributor

Qlik Sense engine crashes after accumulating cache in server's RAM memory

Hello Qlik Community,

we got the engine service crashing and unexpectedly restarting on our Qlik Sense Enterprise cluster on a weekly basis. After months of testing possible solutions with our local Qlik reseller and benchmarking with other companies that also use qlik sense enterprise, we are coming to the conclusion that Qlik Sense is not managing to purge the cache from RAM memory in a clustered architecture.

Some context:

  • We had the following nodes in our architecture:
    • 1 virtual machine running the central node with QMC and PostgreSQL (Windows server 2012);
    • 1 virtual machine running the production proxy/hub (Windows server 2019);
    • 1 virtual machine running the proxy/hub for app creation/dev (Windows server 2019);
    • 2 virtual machines running only scheduled reload tasks (Windows server 2019);
    • 2 other virtual machines only for qvd/qvf file share (Windows server 2019).
  • Both production and dev hub engines crashes, restarts and drops all user access;
  • RAM purge issue started to occur after we updated Qlik Sense enterprise to Nov 2021 version.

Some actions and findings under supervision of our Qlik reseller:

  • We applied latest Qlik Sense Nov 2021 patch in all nodes;
  • Reduced 'App cache time' at engines settings in QMC down to 1h;
  • We also resized CPU and RAM of all machines based on Telemetry dashboard analysis;
  • Got machine resource monitoring through Zabbix and Grafana implemented on all machines, later also monitoring Qlik Sense services and PostgreSQL database;
  • Windows server event viewer and Qlik Sense log files were analyzed too with support of our Qlik reseller;
  • The [ClearCacheTimesPerDay] setting in the Engine's Settings.ini has been tested with different values;
  • We changed the windows version of the machines that run the proxy/hub to the same version of the QMC (Windows server 2012);
  • Then put a single node (the central one) to run the QMC and the production hub (this greatly reduced incidents, but did not solve 100% this issue).

Even so, the Engine crashes eventually on the central and dev-hub nodes (disconnecting all hub users), always with the use of RAM scaling similarly to the image below:

engine craches.png

 

This error makes it impossible for the environment to have two machines in the cluster running the proxy/hub (which would reduce the risk of unavailability: if one server crashes, the second one would keep users connected).

Talking to IT professionals at other companies, it seems that this case recurs among some of Qlik's customers.

 

Has anyone else seen such behavior in the Qlik Sense engine? Do you have any other solutions to recommend?

 


Thank you for your support.


 

 

 

Labels (2)
2 Replies
Croisfelt
Contributor
Contributor
Author

Just happened again.  [ClearCacheTimesPerDay] is set to 3 at Engine's Settings.ini, so cache is purged from RAM three times a day at 5:00, 13:00 and 21:00. But last night it was not cleared and we got full memory usage this morning during the first business hours, when company administrative users started to use qlik sense. Then qlik sense engine service restarted in an unplanned way.


no_memory_release.png

 

 

 

 

NadiaB
Support
Support

Hi @Croisfelt 

Engine Crashes is not an expected behavior, have you had the chance to look at the engine logs to see if you see an error at the time of the issue occurs?

Did you find anything in the Windows logs?

When the engine crashes the RAM consumption usually goes to 0, that doesn't seem the case based on the images above, have you seen any error indicates the engine crashes, or is being assumed that the engine crashes due to the chart showing release of resources?

Do you see the same resource consumption pattern in the Operations Monitor app ?

If the Engine is crashing there should be something in the logs that could explain where the issue relies. 

Hope it helps. 

Don't forget to mark as "Solution Accepted" the comment that resolves the question/issue. #ngm