Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Jens_Argentzell
Employee
Employee

Quick tips #8 - Server settings for best performance

This page describes the settings for best performance for servers running the Qlik Associative Engine.
Latest update: May 2022

Windows 2022 has improved performance for servers with many physical cores. This table shows the definition used in the below document.

 

Older Windows versions

Windows 2022

Server with normal core count

≤64 physical cores

≤90 physical cores

Server with large core count

>64 physical cores

>90 physical cores

 

BIOS settings

SettingValue
Hyper-threading

Applies to QlikView and Qlik Sense servers:

  • Server with normal core count: Enabled
  • Server with large core count: Disabled

There are use cases that even on servers with huge #cores enabling hyper-threading is beneficial. Therefore, it is best to test these settings for your application.

Power Management (System Profile Settings)

Applies to QlikView and Qlik Sense servers:

  • Custom with Max performance and C states enabled

Another setting that can be used is the full performance setting. But this settings makes the server run constantly at the maximum clock speed for all cores, which has the following drawbacks:

  • The server uses more power.
  • The CPUs do not use clock speeds higher than the speed of the all-core boost clock, which usually is lower than the maximum boost clock speed of the CPUs.

A solution to this is to use a custom system profile in the server BIOS that allows the CPUs to use their C states while all other components are set to full performance. The custom system profile should be set up similar to the following:

  • CPU power management: Max performance
  • Turbo boost: Enabled
  • Energy efficient turbo: Disabled
  • C states: Autonomous (if available, otherwise Enabled)
  • C1E state: Enabled
  • Uncore frequency: Max
  • Memory frequency: Max
  • Energy efficiency policy: Performance
  • Determinism slider: Power determinism
NUMA

QlikView servers (Intel):

  • Disabled*

Qlik Sense servers (Intel):

  • Server with normal core count: Disabled*
  • Server with large core count: Enabled

*On servers with Intel CPUs, NUMA is disabled by enabling Node Interleaving.

QlikView  and Qlik Sense servers (AMD EPYC):

  • NUMA mode should be set to Automatic and 1 node per socket. (L3 Cache NUMA Nodes disabled.)
Memory configuration

QlikView and Qlik Sense servers:

  • Configured for best performance (the DIMM slots for every CPU should be populated in accordance to the server manufacturer's specification for best performance)
Hardware/Software Prefetcher

QlikView and Qlik Sense servers:

  • Enabled

 

The names of the settings and how to tune them may differ depending on the server manufacturer and model. Refer to the documentation for your server to find the equivalents of the settings listed above.

Operating system settings

Microsoft Windows

SettingValue
Power plan

QlikView and Qlik Sense servers:

  • High Performance
Registry update

Qlik Sense servers only:

For servers with a large core count, there is a registry change, applicable to both Intel and AMD CPUs, that improves the responsiveness when the Qlik Sense Repository Service (QRS) is under heavy load (for example, when many users open the hub at the same time).

Two registry updates are needed:

Add the Thread_NormalizeSpinWait key as a DWORD value to the following subkey: HKEY\LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework

  • Value name: Thread_NormalizeSpinWait
  • Value data: 1

Add the Switch.System.Threading.UseNetCoreTimer key as a String value to the following subkey: HKEY\LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\AppContext

  • Value name: Switch.System.Threading.UseNetCoreTimer
  • Value data: true

The fix is described in full here: https://support.microsoft.com/en-za/help/4527212/long-spin-wait-loops-in-net-framework-on-intel-skyl...

 

/ Cheers from the Scalability Center team

Labels (1)
31 Replies
Not applicable

Both HT and NUMA disabled.  NUMA was disabled on all tests I ran.

Not applicable

Did you see great improvement?

How did you launch 40 qvw at the same time? by QV desktop maybe?

because there's a limit of 9 concurrent?

After changing the BIOS settings, is there anything else to be done before making QV functioning?

Thanks.

Not applicable

To run 40 concurrent processes you need a few things:

  • A server that can handle the load (ideally 40 cores+)
  • To update the heap size settings (contact support for details, or search the qlikcommunity - there are quite a few posts on the matter)
  • Increase the max number of engines settings on the QDS via the QMC.

I ran the concurrent tasks by writing a script to generate the 40 clones of a test document, create QDS tasks for them, and schedule them to run concurrently every x minutes.

Not applicable

It seem weird to disable multithreading and NUMA in order to improve performance is there a logical explanation for this??

Miguel_Angel_Baeyens

Sure it is a logical explanation. Further into in the Technical Brief on Overview on QlikView Scalability and Performance you can download from www.qlik.com

QlikView is at its very core a high CPU and RAM memory application, so the larger and more powerful CPUs and the more and faster RAM the system has, the better QlikView will perform.

NUMA basically makes each core access its memory, and only in case this core runs out of memory and needs more, have to ask to another core. In previous hardware configuration, where there was only one socket (one CPU in the motherboard), it was easy to access another core's memory if any, because they were hardware wired. In other words, NUMA is OK when you are using a physical hardware to virtualize or store files because it allows to better link vCPUs to physical CPUs and manage resources, but it's not OK when you need a high demanding application, as QlikView is, that will use as many RAM as you provide it.

Moreover, now with 4 and 8 socket computers, cores are no longer connected to each other, rather than using hemispheres so all communications have to pass through the bus from one hemisphere to another, therefore generating bottlenecks. In addition, the action of jumping from one core's memory to another's takes many CPU cycles hence downgrading performance. Actually, it has been proven that the larger number of cores you have, the poorer performance you will get.

What disabling NUMA does is to make all memory available, regardless the number of cores and the amount of RAM installed onto the system, so QlikView can benefit from all memory without jumping from one core to the other and avoiding those bottlenecks.

Note that not all processors allow disabling NUMA.

Similar with hyperthreading: it makes the hardware to take extra cycles of CPU to send instructions to one or the other thread that each core allows. However, there are tests in the documentation I mention where it has been proved that 2 socket CPUs with hyperthreading have a better throughput.

Kind regards,

Miguel

Anonymous
Not applicable

Miguel

Wow !!!

Many thanks for supplying that explanation.  I had always been confused as to why disabling NUMA is recommended for a QlikView Server and now I know as your explanation makes perfect sense.

Many Thanks,     Bill

Not applicable

Thanks Miguel Great explanation!!!

Michael_Tarallo
Employee
Employee

Great answer Miguel. There is also a technical brief for public consumption here:

Mike T

Regards,
Mike Tarallo
Qlik
Everest_QV_AppSupport
Contributor
Contributor

I have an HP server that has two options that are confusing me: node interleaving(Enable/Disable) and Numa grouping(Clustered/Flat). It was disabled and clustered respectively. In qmc I could only see 1/4 of the available cores. When our server team changed NUMA grouping to flat I was able to see all my cores, but I'm wondering if node interleaving should still be changed to enabled.

Frederic_De_Ranter

Hi Jonathan,

The two options are indeed a bit confusing. Most servers only have one option to enable/disable Node Interleaving. And when Node Interleaving is enabled, the processor will not be grouped by NUMA node.

Apparently in the recent HPE server, HPE has split this up in two options.

So in your case it is recommended you put Node Interleaving to enabled (NUMA disabled) and NUMA grouping to Flat (so that the cores are not grouped according to their NUMA node)

By setting Node Interleaving to enabled you will make sure that QlikView will spread the data evenly over all memory available on all sockets. The SW is normally able to detect NUMA, but it is still better to disable it.

Best regards,

Frederic