Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys,
I am looking for a second opinion as we are already working with our vendors and Qlik reps. We just purchased a new server and I found out that it performs terribly with QV even though it is 4 times more expensive and powerful. We are going to return this server to our vendor (HP) and get something different hence I am asking for an opinion on an ideal and fastest for QV HP server.
Right now my company uses a single server to run all QV services, including QVS, QDS and AccessPoint. It is very fast but we are running out of RAM and it is not expandable so we are in the process of getting a second server for QV which will only run AccessPoint and QVS.
Our old server:
2 socket HP DL380P GEN8
Intel Xeon CPU E5-2690 2.90GHz
256Gb of RAM
Our new server (which we are going to return):
4 socket HP DL580 Gen8
Intel Xeon CPU E7-4870 v2 2.30GHz
1Tb of RAM
I was super excited about DL580 and when it arrived I started running some tests. We have a fairly complex dashboard with 90+ million rows and a lot of extensive calculations and I found out that our old server was 2-5 times faster on the most of the tabs. We tuned the server using QV tech. paper specifically made sure that max. energy profile is used, hyper threading is off and NUMA is off. Turbo Boost is on.
It helped a little bit especially then we disabled NUMA – I am confused why though because according to QV, NUMA is not a problem on 11.2 but apparently it is. So after we disabled NUMA, things improved maybe by 10-20% but it was still a way slower than our old server.
We reached to HP and one of their senior architects explained that DL580 has to do more work with RAM because of the way RAM is shared between processors and the clock speed is 1333 vs. 1666 on DL380.
At that point I downloaded and ran 4 different benchmarking tools (MAXXMEM2,NOVABENCH, PassMark and SiSoft Sandra) and all of them showed 2-6 times difference in RAM tests – DL580 was slower again!
Our goals for a new server:
I apologize for a long introduction to my question but now I am puzzled what HP server we should pick since DL580 with 4 processors clearly does not meet our needs.
I am not a hardware expert and was relying on our hardware people and Qlik but apparently the configuration they picked did not meet our goals. They are working again to revise the config but I wanted to get a second opinion from a forum and you.
I am thinking now to either get the fastest E5 or E7 and this time only two sockets to minimize memory hops. Also I wanted to see if we can use higher clocked RAM (1887?)
It is going to be a rack server from HP Proliant family – you can actually build it online here
http://www8.hp.com/us/en/products/proliant-servers/index.html?facet=ProLiant-DL-Rack
Any suggestions are highly appreciated!
Our final configuration
HP DL380 Gen9
2 CPU 8 Core E5-2667 v3 3.2 GHz
512Gb of DDR4 LRDRAM 2333Mhz
(you could go up to 768Gb but it will force it to work at 1600Mhz)
BIOS settings:
hyperthreading is off, Enable HW prefetch, Enable NUMA, virtualization is off, Power set to High Performance, Turbo Boost is on
Read below for more details.
We just got E5-2667v3 CPU (8 core 3.2 GHz) and I just finished testing it. We also purchased 768Gb DDR4 of RAM but at that amount it can only work at 1600MHz clock speed and I wanted to test its native clock speed at 2333Mhz.
Boy that made a difference! While I saw 15-20% improvement in performance for this new CPU / 768Gb of RAM, once we removed some DIMMs to get down to 512Gb (so RAM could work at 2333Mhz), I saw 30-35% performance improvement compared to our old server and for some sheets as high as 60-65%! The last one was on one of our most popular Summary dashboard pages that has over 75 metrics calculated at once. On our old server it would take 15-20 seconds to calculate that page and now it takes 5-10 seconds, quite an improvement.
So my takeaway from this long journey is this:
0) Hardware upgrade should be the last resort - use best practices when you build your apps! spend good amount of time on your data model. If it looks like spider web or spaghetti, redo it. Same with expressions - bad expression can even crash your server no matter how powerful it is. In our case, we had a very decent data model and an aggregated version of our most popular dashboard but it was not enough and our user base was growing rapidly the upgrade was justified.
1) do not trust the hardware specs in the case of QlikView - more expensive / faster hardware, does not mean it will make your QlikView dashboards faster. 4 CPU DL580 server was 4 times slower for us and 3 times more expensive! Luckily we were able to return it to vendor. Pick your most used dashboard with a lot of calculations and test, test, test
2) the more RAM you install, the slower it will be - get high performance RAM (DDR4) and make sure your RAM operates at the highest clock speed it can support
3) higher clocked CPUs matter! if you want the speed, get one with less cores, but higher clock. IF you want capacity to handle more users, get the one with more cores, but it will be slower.
4) also keep in mind that CPUs with large number of cores (above 8), will have some overhead to use RAM so will be slower (but will handle more users concurrently)
5) play/test BIOS settings, I ended up with the following:
hyperthreading is off, Enable HW prefetch, Enable NUMA, virtualization is off, Power set to High Performance, Turbo Boost is on
6) 4 CPU beasts is NOT a good choice - they are slow with QlikView. Go with 2 CPU servers.
Hope it will help someone! Took us 9 months and I am finally happy with our choice. I always knew that hardware should be picked considering the software it will run and in QlikView case it was very evident!
Hi Borys,
We are also using the same set of servers and we are also facing the same issue with HP Proliant GL580 gen7. One thing which improved the server response is by reducing the number of CPU's used by the Qlikview server from 64 CPU's to just 32 CPU's.
The perfromance has improved a bit. You can try that in your system and provide your feedback.
Thanks,
Sagar
We had the same problem with a different server. Someone gave me the following explanation, which could be your problem as well, I hope it helps ( the problem is going from two to four sockets):
-------
Frank,
We ran into this as well on our QV publisher box, but first let me say that as a standard we disable NUMA (enable node-interleaving) on all our QlikView hardware (physical and virtual) and have measured a 35-45% performance boost by simply making that change. For the QVS, we see impacts of NUMA mostly on chart calculations and rendering with AJAX clients (plus all the CPUs tend to stay at max 100% for a few wall-clock seconds if NUMA is on, even for one chart calculation). If you search the community on the scalabilty lab sections you will find several hardware setting recommendations that basically say disable NUMA, disable hyperthreading unless you are on a 2-way Intel E5-xxxx architechture, enable hardware prefetch, and maybe one or two I'm forgetting here.
I'm guessing your server has 4 physical chips (slots) with 8 cores each for a total of 32 cores, set out probably like this on the board:
1 2
3 4
I can't give you the true "nuts and bolts" technical explanation, but I will try to paraphrase the QlikView Scalabilty Lab's recommendation to us. Basically, a 4x architechture presents issues for QlikView because of the way Qlikview splits processes into multiple cores on multiple chip slots and the inefficiencies in managing resources between the cores on chips 1 and 4 and 2 and 3. So if a single chart, etc. is using cores on chips 1 and 4 OR 2 and 3 to calculate, there's some efficiency issues in splitting those among the actual "slots"; if using cores from 1 and 2 OR 3 and 4, those inefficiencies are not present, so a 2x architechture would not have this issue. When the QlikTech scalability lab and I modified our CPU Affinity for publisher as per the instructions below, we measured about a 50% performance improvement. Our QVS machines were already all 2X so I've not experienced it on those but I expect it would be similar.
The QVS CPU affinity can be changed in the management console, but for publisher, you have to change the CPU affinity through a settings file
ProgramData\QlikTech\DistributionService\Configuration.xml
The row: <QVBConfig MaxSimultaneousQVBs="4" MaxSimultaneousReaderQVBs="20" CPUAffinity="11111111111111111111111111111111" CPUPriority="Low" />
If you have 32 logical cores as in my case you would see 32 ones to indicate that all are active. If you want to use only the last 16 then to the following:
1. Stop QDS service
2. Replace the first 16 ones with zeros
3. This would give the following setting: .." CPUAffinity="00000000000000001111111111111111" ..
4. Start the QDS service
You are now only using half of the possible cores
We were able to improve the QV publisher performance by updating the wondows 2008 server bios as well. check out below details, hopefully that might also help.
What --> Upgrade System BIOS
Faulty BIOS --> P65 07/01/2013
Fix --> Apply SP63928.exe to upgrade to P65 10/01/2013
More Info --> P Support Center (http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?javax.portlet.begCac...)
We were able to improve the QV publisher performance by updating the wondows 2008 server bios as well. check out below details, hopefully that might also help.
What --> Upgrade System BIOS
Faulty BIOS --> P65 07/01/2013
Fix --> Apply SP63928.exe to upgrade to P65 10/01/2013
More Info --> P Support Center (http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?javax.portlet.begCac...)
Hi,
"QVBConfig MaxSimultaneousQVBs" this is only for reloads and will not improve the QVS performance 😕
/olli
I am glad we are not the only ones facing this issue - thanks for all your replies and I feel better now
I already reviewed and applied settings recommended as per Scalability Lab document. Disabling NUMA helped a little bit and I noticed maybe 10-20% improvement. As for other settings like disabling hyperthreading and disabling prefetch - I do not think it actually did anything.
The BIOS is fresh - our server is literally straight out of factory. Even CPUs are from the most recent batch so everything is fresh
I did play with CPUAffinity for AccessPoint and disabled last half of CPU cores and rebooted after that - it actually did not help at all. Our HP architect (super knowledgeable guy and he works for HP) recommended to remove 2 CPUs - but for me it does not make sense because each processor cost us like 6k.
I think we are moving towards exchanging the server on a 2 socket one and probably getting the fastest Xeon E7 v2
. The problem there though that most 2 socket configurations can only access 512GB of RAM or 768GB at most (but slower DIMMs).
But all in all it is a big disappointment for us so far - we spent 4x times more money on this server than our our old one cost us and it is literally x4 times slower than the old one which is ironic but sad matter.
I wonder now if something can be improved in QV engine itself by Qlik - 4 and 8 socket machines are pretty cheap these days and we will see soon even more powerful servers so I believe it is something that Qlik needs to look at and address in future.
Sagar,
by reducing from 64 to 32 did you mean setting up the processor affinity to every other core?
So in effect you would disable hyper-threading.
Frank Beunder wrote:
...The QVS CPU affinity can be changed in the management console,
Frank,
what pattern have you checked out on the managment console? something like 000000000000011111111111111 or 101010101010101?
Yes KlickiBunti,
I got a document from internet on best practices configuring HP Proliant DL580 gen8 server, based on that hyperthreading causes issue when number of CPU's are more than 32. To overcome this we reduced the CPU Affinity from 64 CPU's to 32 CPU's/
http://h20195.www2.hp.com/V2/GetPDF.aspx%2F4AA5-1110ENW.pdf
CPU Affinity set is like 1111111111111111111111111111111100000000000000000000000000000000