Re: Capacity Planning (Server RAM) - Qlik Community

Report Inappropriate Content · ‎2012-09-13

We have a QV Server and for now just have a single dashboard running. As we prepare to add dashboards, we are looking at capacity planning for the server. I'm using the formulas gathered from QlikTech documentation, but numbers are just not what I expect, so I'm hoping for guidance from folks here that have experience with enterprise deployments.

Some background:

Server RAM: 96,000Mb

Dashboard Size = 75Mb

Users (maximum if all licenses in use concurrently) = 550

Calculations:

Server resources: 1,000 Mb for the OS + 30 Mb for QV Server

User 1: 75 * 7 (using rather pessimistic multiplier of 7 on a scale of 4 to 10) = 525 Mb

Additional Users (using 10% additional users factor, again pessimistic): 549 * (525 * 10%) = 28,823 Mb

Total users = 29,348, then add 400Mb for QV services, so 29,748 Mb.

This assumes of course 550 concurrent users (since this dashboard is relatively new, the actual maxium usage is much less). In fact, to date I don't think we've ever exceed more than about 150 concurrent users.

Yet when I examine the processes on the server during an idle/inactive time when no users are online, I see (with just this one dashboard on the server, currently) about 70 Gb of RAM devoted to the qvs.exe process. Further, I see the following Performance numbers at the console:

VMCommitted	67090.855469
VMAllocated	75493.734375
VMFree	22807.144531
VMLargestFreeBlock	22807.144531

So am I missing something here with regard to how I have modeled calculations to estimate memory requirements?

Are the formulas above proper? Perhaps the user #1 multiplier value (7) and/or the additional users factor (10%) is not right, although these both are pessimistic based on the ranges provided by QlikTech.

Why would the qvs.exe process claim this much RAM?

Thoughts and ideas most appreciated.

bnichol · ‎2012-09-13

Because you are only serving a single document, it should be relatively simple to investigate the RAM usage.

What is the RAM usage when the document initially loads?

How quickly does the RAM ramp up? Does it constantly run at 70GB?

How are users connecting to the document? IE-Plugin, AJAX?

How often is the document reloaded? Do you have the QVS configured for only a single copy of the document?

Have you run the document analyzer on the document? It can be found at http://robwunderlich.com/downloads/

Is Collaboration turned on? Have you investigated the Collaboration objects using the power tools? http://community.qlikview.com/docs/DOC-3059

How often do you restart you Server or QlikView Server service? We do a daily server restart to eliminate any long-term memory leak buildup.

The RAM usage is primarily based on the volume of unique data in the document, but we have some charts that are using calculations that can cause 10 - 20 GB spikes in usage.

Regards,

B

Miguel_Angel_Baeyens · ‎2012-09-13

Hi,

A few notes here. That 10% is an average, and depends largely on how the document is developed, number of rows, length of values, data model schema, level of granularity and distinctness of data, the number of charts, concurrency, cached selections and so.

Besides, all documents are in memory until its timeout happens, so you may find that a document that is not being in use by anybody is still loaded, which makes sense. If you are preloading, that uses RAM as well. And likewise happens with cached selections: the more objects you have the higher amount of RAM you will use until the working set is reached and the server starts to free memory to cache new queries. If you are using section access and reduction, that takes some RAM as well...

In addition, DMS authorization stores info in the .Shared and .Meta files, as well as the documents set to collaboration, notes, shared objects and bookmarks...

In this sense, concurrency may "happen" even when users are not logged in, because their copy of the document is still in memory, depending on the timeouts.

To make a more accurate approach I'd measure RAM usage in each of the following steps (you may add as many additional steps as you want to make the review more accurate yet):

Make sure the preload option is set to "never" in all documents (while testing)
For testing purposes, set all documents timeouts to a very low value (2 o 3 mins)
Reboot the computer and log on, and check
Open QlikView Desktop, open the QVW file, and check.
Close QlikView Desktop
Start QlikView Services and check
Restore preloading settings in all documents if any and check
Make one user log on and loading a document using Ajax and check
(Same with IE Plugin if users will use it)
Make the second user (the first is being there) log in and open a document and check
Log off both users, let the document timeout and check

From this point on, users should add the same amount of memory in average. Now start doing clicks: CPU will go up while computing, but RAM will go up slowly as well as it's caching each document selection. You should see that when the timeout is reached, some RAM is free.

Hope that makes some sense and the tests go fine. Let us know anyway.

Miguel

Report Inappropriate Content · ‎2012-09-14

Thanks so much, bnichol and Miguel. Very helpful information. I wasn't aware of Rob's Document Analyzer, and I find that a very useful tool.

I think I'm seeing the very high memory utilization for the QVS.exe server on the server simply due to memory creep over time, no doubt attributable to a leak as the document is loaded/unloaded and many users open and close the document over a long period. I'll have to look into automating a regular restart (or at the least, stopping and starting the service).

I ran a number of test and recorded memory usage for the Server service (QVS.exe) to confirm. I've attached results for review by others, if interested.

Details are in the attached workbook, but to summarize:

Using QV10 SR2 on a server (Windows Server 2008 R2 SP1).

Dashboard is a 70Mb document, approx. 1.5 million records in fact table

For each round of testing, I stopped and re-started the service to begin with a clean slate (on startup the service uses < 8 Mb of RAM).

For single user testing, I ran three types of test, each with multiple iterations to record the memory after each. For multi-user testing, I ran with three users, each performing identical functions. In both single and multi-user testing, I tested with just opening the document, then closing. Then I tested by opening and viewing a handful of sheets. Finally, I stress-tested by opening the document, visiting all sheets and viewing all charts (~125 charts across 18 sheets).

As you can see form the results in the attached workbook, memory required seems to expand more or less as expected with additional users. However, memory is not released after each document close (after timeout), so over time this results in more and more additional RAM allocation. Once the working set Low Threshold is exceeded, this results in swapping the memory to disk, which of course will degrade performance.

This testing has helped me tweak our capacity planning model (at least for this particular dashboard). I've been testing on our TEST server, so I will look into scheduled restarts of the server (or services) in our PROD environment, then monitor as we move forward and see how it goes.

Report Inappropriate Content · ‎2012-09-14

Thanks so much again bnichol.

I have details of my testing in the attachment of the other reply here. To clarify a few things you asked about.. using QV 10SR2 with IE plugin, reloaded once each day, only one copy allowed in memory. With regard to collaboration, we only use server bookmarks (so far).

As you've suggested, I think the memory leak/creep is the main issue here... we don't currently do a scheudled restart of the server (or services), so I'll explore that next.

rwunderlich · ‎2012-09-14

It's unlikely to be a memory leak, but more likely just cache. My undertanding is that cache is not released when a document is unloaded.

However, cache should be trimmed when you hit the working set low threshhold, so you should not see paging in the scenario you've described, but rather cache trimming. You are on an older SR. I would recommend updating to the latest (SR5) and retesting your model.

-Rob

Report Inappropriate Content · ‎2012-09-17

Thank you for the feedback, Rob.

Indeed we're looking at upgrading our environment. I'm curious your opinion on the option of 10 SR5 vs. 11 SR1. I like many of the new features in version 11, but we've held on upgrading our environment just because we're a bit conservative. In our environment, upgrading production environments requires a bit of a process, so we don't typically take every SR or patch of an application unless there is a compelling reason to do so.

So given we are cautious, and considering stability and maturity of 11 SR1, would you recommend we move to 11 SR1, or instead stay with 10 and bump to SR5?

rwunderlich · ‎2012-09-19

I think both versions are fairly stable, and both versions have a similar number of problems. I would vote for V11 due to the additional features. However, you may want to wait for SR2 which should be out soon.

-Rob