Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
A few weeks ago, we moved our QV server cluster to new hardware and what we get was a dramatic performance loss, especially when selecting in list boxes.
Since this loss of performance was also present on the desktop, I did some tests with it.
The result is very frustrating.
Not only the new hardware is a problem but also the QV versions from 12.30 onwards have a significantly worse performance than version 12.20.
The test environment:
Old Server:
HP ProLiant DL380 Gen9
CPU: 2x Intel Xeon E5-2667 v3 (3200 MHZ, 8 Cores)
RAM: 16x 32GB DDR4-2132 LRDIMM
OS: Windows Server 2012 R2 Standard
New Server:
HPE ProLiant DL380 Gen10
CPU: 2x Intel Xeon Gold 6254 (3100 MHz, 18 Cores)
RAM: 12x 64GB DDR4-2933 LDIMM
OS: Windows Server 2016 Standard
The application I used has a working set of 21GB after opening and reaches a peak of 33GB during the test.
For the test, I use a macro that simulates the behavior of a user.
The test consists of 2 parts. First, a sheet with a dashboard (38 diagrams) is activated. Then a selection is made in 6 fields one after one.
The results (all values in seconds):
HP ProLiant DL380 Gen9:
12.20 SR10 | 12.40 SR3 | |
Dashboard | 58.264 | 81.594 |
Selections | 45.885 | 132.595 |
Total | 107.119 | 240.83 |
HPE ProLiant DL380 Gen10:
12.20 SR10 | 12.40 SR3 | |
Dashboard | 127.255 | 168.244 |
Selections | 90.418 | 181.604 |
Total | 222.189 | 365.444 |
With Version 12.30SR3 I got the same bad results as 12.40.
A 2 times worse performance of 12.40 compared to 12.20 and with Gen10 the performance deteriorates again twice!!!
Our NPrinting server is also a Gen10 server. We use NPrinting only with local qvw files. Daily there are 78 jobs scheduled, all at the same time. If QV Desktop 12.20SR10 is installed, NPrinting needs about 35 minutes to finished all jobs. If Version 12.40SR3 is installed, more then 4 hours!!!
Only the Publisher has a very good performance on Gen10.
???
- Christian
Hi Christian,
The degrade you are seeing between QV 12.20 and 12.40 is not normal and I would highly recommend you to open a support case for this.
I have recently tested a server with 2 Intel Xeon Gold 6254 and compared it to a server with 2 Intel Xeon E5-2687wv3 (which has a few more cores than yours, but lower clock speed) and can say that in most test cases the new CPU has much better performance. One test case where I test an app that is specifically designed to have a lot of single threaded operations, the older CPU is the better performer. Generally for single threaded operations, the servers with less cores (and higher turbo boost clock speeds) will perform better.
To figure how the sheet in your app is using the CPU, you can open the task manager on the server, change to the Performance tab and change the view to logical processors. Then you can see what happens when you open the sheet and see if all cores are working all the time or if there are moments (before the sheet is fully displayed) that there is only 1 or 2 cores working at 100%.
Some more things you could try out: (quickly took a look at your BIOS settings file):
Best regards,
Frederic
Hello Frederic,
thanks a lot for your response. I have done some tests again on HPE Gen10.
1. Basically, all cores work when calculating the dashboard, one permanently having 100% and the other between 60% and 100% with 12.20 and between 90% and 100% with 12.40.
2. With hyperthreading, performance decrease again about 30%.
3. Node Interleaving / NUMA. My point of view:
The parameter NUMA_Group_Size_Optimization is only meaningful if there is more than one NUMA group. The parameter Sub-NUMA_Clustering is decisive here. The DL380 Gen10 has 2 CPUs and each of them has 2 memory controllers. If all 36 cores are enabled, the following applies:
However, for the tests with the QlikView Desktop, NUMA brings a minimal improvement. For the QlikView Server with NUMA, a very fluctuating performance can be felt. For dashboards from very fast to very slow and for selections from ok to extremely bad.
4. Your 3rd point concerns the CPU feature "Core Boosting". The Xeon Gold 6254 does not support this feature. I can save power by setting C-States, but I don't get more performance.
5. Only switching off cores really brings a performance gain:
12.20 SR10 | 12.20 SR10 | 12.40 SR3 | 12.40 SR3 | |
36 Cores | 18 Cores | 36 Cores | 18 Cores | |
Dashboard | 127,255 | 87,496 | 168,244 | 115,837 |
Selections | 90,418 | 63,372 | 181,604 | 145,088 |
Total | 222,189 | 154,306 | 365,444 | 302,194 |
Would it mean that it makes no sense to use a CPU with more than 10 cores for HPE Gen10 ?!
Anyway. The performance of the Gen9 cannot be achieved on the Gen10. No matter what I set, the performance of 12.40 is always worse by a factor of 2 than 12.20.
Best regards
Christian
Hi Christian,
About NUMA: we've seen from our own and customers experience that NUMA_Group_Size_Optimization is actually really important to be set correctly even when Node_Interleaving = enabled & Sub-NUMA_Clustering = disabled.
Maybe HPE has updated their BIOS to change this since I have not tested this lately.
The Xeon Gold 6254 does support clock core boosting (as do all second generation Xeon Gold scalable processors). (https://ark.intel.com/content/www/us/en/ark/products/192451/intel-xeon-gold-6254-processor-24-75m-ca...) And the time and speed that the clock of a single core can be boosted is very dependent on the amount of power the CPU can dissipate. To simplify; we can state that a CPU can maximum use its defined TDP (for your CPU that is 200W). If some cores can be lowered in clock speed and thus dissipate less power, others will be able to boost longer. But of course if your CPU utilization is showing that it is using all cores (and thus already reaches its maximum power dissipation), no cores will be lowered in clock speed and there is no extra room to boost more.
Disabling cores will give a similar effect: since less cores are dissipating power, the other cores have more room to boost.
If you are doubting the performance of your Gen10 server, I can recommend running the HW BM package for QV 12 which you can find here on community: https://community.qlik.com/t5/Qlik-Scalability/QlikView-12-Hardware-Benchmarking-Package/gpm-p/14786.... You can then send in the results you get and we will send you back a report, to compare it with a similar sized server.
As stated in my previous message, I have tested this CPU in a server from another vendor with the HW Benchmarking package and it is performing according to its capacity. E.g. for a heavy load test with the Hardware Benchmarking application:
CPU | Settings | Average Response time (ms) | Average CPU Utilization |
Xeon Gold 6254 | Htoff | 3190 | 84% |
Xeon Gold 6254 | HTon | 2363 | 89% |
Xeon E5-2687Wv3 | Htoff | 7887 | 94% |
Xeon E5-2687Wv3 | HTon | 5289 | 93% |
Of course the data model and the application design are also very important when it comes to getting optimal performance. I would recommend you to open a support ticket so that all aspects of this issue can be looked at.
Best regards,
Frederic