7 Replies Latest reply: Sep 19, 2012 11:45 PM by Rob Wunderlich RSS

    Capacity Planning (Server RAM)

      We have a QV Server and for now just have a single dashboard running.  As we prepare to add dashboards, we are looking at capacity planning for the server.  I'm using the formulas gathered from QlikTech documentation, but numbers are just not what I expect, so I'm hoping for guidance from folks here that have experience with enterprise deployments.

       

      Some background:

           Server RAM:  96,000Mb

           Dashboard Size = 75Mb

           Users (maximum if all licenses in use concurrently) = 550

       

      Calculations:

           Server resources:  1,000 Mb for the OS + 30 Mb for QV Server

           User 1:  75 * 7 (using rather pessimistic multiplier of 7 on a scale of 4 to 10) = 525 Mb

           Additional Users  (using 10% additional users factor, again pessimistic):  549 * (525 * 10%) = 28,823 Mb 

           Total users = 29,348, then add 400Mb for QV services, so 29,748 Mb.

       

      This assumes of course 550 concurrent users (since this dashboard is relatively new, the actual maxium usage is much less).  In fact, to date I don't think we've ever exceed more than about 150 concurrent users.

       

      Yet when I examine the processes on the server during an idle/inactive time when no users are online, I see (with just this one dashboard on the server, currently) about 70 Gb of RAM devoted to the qvs.exe process.  Further, I see the following Performance numbers at the console:

       

      VMCommitted

      67090.855469

      VMAllocated

      75493.734375

      VMFree

      22807.144531

      VMLargestFreeBlock

      22807.144531

       

      So am I missing something here with regard to how I have modeled calculations to estimate memory requirements?

       

      Are the formulas above proper?  Perhaps the user #1 multiplier value (7) and/or the additional users factor (10%) is not right, although these both are pessimistic based on the ranges provided by QlikTech.

       

      Why would the qvs.exe process claim this much RAM?

       

      Thoughts and ideas most appreciated.

        • Re: Capacity Planning (Server RAM)
          Brent Nichol

          Because you are only serving a single document, it should be relatively simple to investigate the RAM usage.

           

          What is the RAM usage when the document initially loads?

          How quickly does the RAM ramp up?  Does it constantly run at 70GB?

          How are users connecting to the document?  IE-Plugin, AJAX?

          How often is the document reloaded?  Do you have the QVS configured for only a single copy of the document?

          Have you run the document analyzer on the document?  It can be found at http://robwunderlich.com/downloads/

          Is Collaboration turned on?  Have you investigated the Collaboration objects using the power tools? http://community.qlikview.com/docs/DOC-3059

          How often do you restart you Server or QlikView Server service?  We do a daily server restart to eliminate any long-term memory leak buildup.

          The RAM usage is primarily based on the volume of unique data in the document, but we have some charts that are using calculations that can cause 10 - 20 GB spikes in usage.

          Regards,

          B

            • Re: Capacity Planning (Server RAM)

              Thanks so much again bnichol.

               

              I have details of my testing in the attachment of the other reply here.  To clarify a few things you asked about.. using QV 10SR2 with IE plugin, reloaded once each day, only one copy allowed in memory.  With regard to collaboration, we only use server bookmarks (so far).

               

              As you've suggested, I think the memory leak/creep is the main issue here... we don't currently do a scheudled restart of the server (or services), so I'll explore that next.

                • Re: Capacity Planning (Server RAM)
                  Rob Wunderlich

                  It's unlikely to be a memory leak, but more likely just cache. My undertanding is that cache is not released when a document is unloaded.

                   

                  However, cache should be trimmed when you hit the working set low threshhold, so you should not see paging in the scenario you've described, but rather cache trimming. You are on an older SR. I would recommend updating to the latest (SR5) and retesting your model. 

                   

                  -Rob

                    • Re: Capacity Planning (Server RAM)

                      Thank you for the feedback, Rob.

                       

                      Indeed we're looking at upgrading our environment.  I'm curious your opinion on the option of 10 SR5 vs. 11 SR1.  I like many of the new features in version 11, but we've held on upgrading our environment just because we're a bit conservative.  In our environment, upgrading production environments requires a bit of a process, so we don't typically take every SR or patch of an application unless there is a compelling reason to do so.

                       

                      So given we are cautious, and considering stability and maturity of 11 SR1, would you recommend we move to 11 SR1, or instead stay with 10 and bump to SR5?

                • Re: Capacity Planning (Server RAM)
                  Miguel Angel Baeyens de Arce

                  Hi,

                   

                  A few notes here. That 10% is an average, and depends largely on how the document is developed, number of rows, length of values, data model schema, level of granularity and distinctness of data, the number of charts, concurrency, cached selections and so.

                   

                  Besides, all documents are in memory until its timeout happens, so you may find that a document that is not being in use by anybody is still loaded, which makes sense. If you are preloading, that uses RAM as well. And likewise happens with cached selections: the more objects you have the higher amount of RAM you will use until the working set is reached and the server starts to free memory to cache new queries. If you are using section access and reduction, that takes some RAM as well...

                   

                  In addition, DMS authorization stores info in the .Shared and .Meta files, as well as the documents set to collaboration, notes, shared objects and bookmarks...

                   

                  In this sense, concurrency may "happen" even when users are not logged in, because their copy of the document is still in memory, depending on the timeouts.

                   

                  To make a more accurate approach I'd measure RAM usage in each of the following steps (you may add as many additional steps as you want to make the review more accurate yet):

                  • Make sure the preload option is set to "never" in all documents (while testing)
                  • For testing purposes, set all documents timeouts to a very low value (2 o 3 mins)
                  • Reboot the computer and log on, and check
                  • Open QlikView Desktop, open the QVW file, and check.
                  • Close QlikView Desktop
                  • Start QlikView Services and check
                  • Restore preloading settings in all documents if any and check
                  • Make one user log on and loading a document using Ajax and check
                  • (Same with IE Plugin if users will use it)
                  • Make the second user (the first is being there) log in and open a document and check
                  • Log off both users, let the document timeout and check

                   

                  From this point on, users should add the same amount of memory in average. Now start doing clicks: CPU will go up while computing, but RAM will go up slowly as well as it's caching each document selection. You should see that when the timeout is reached, some RAM is free.

                   

                  Hope that makes some sense and the tests go fine. Let us know anyway.

                   

                  Miguel

                  • Re: Capacity Planning (Server RAM)

                    Thanks so much, bnichol and Miguel.  Very helpful information.  I wasn't aware of Rob's Document Analyzer, and I find that a very useful tool.

                     

                    I think I'm seeing the very high memory utilization for the QVS.exe server on the server simply due to memory creep over time, no doubt attributable to a leak as the document is loaded/unloaded and many users open and close the document over a long period.  I'll have to look into automating a regular restart (or at the least, stopping and starting the service).

                     

                    I ran a number of test and recorded memory usage for the Server service (QVS.exe) to confirm.  I've attached results for review by others, if interested.

                     

                    Details are in the attached workbook, but to summarize:

                     

                    Using QV10 SR2 on a server (Windows Server 2008 R2 SP1).

                    Dashboard is a 70Mb document, approx. 1.5 million records in fact table

                     

                    For each round of testing, I stopped and re-started the service to begin with a clean slate (on startup the service uses < 8 Mb of RAM).

                     

                    For single user testing, I ran three types of test, each with multiple iterations to record the memory after each.  For multi-user testing, I ran with three users, each performing identical functions.  In both single and multi-user testing, I tested with just opening the document, then closing.  Then I tested by opening and viewing a handful of sheets.  Finally, I stress-tested by opening the document, visiting all sheets and viewing all charts (~125 charts across 18 sheets).

                     

                    As you can see form the results in the attached workbook, memory required seems to expand more or less as expected with additional users.  However, memory is not released after each document close (after timeout), so over time this results in more and more additional RAM allocation.  Once the working set Low Threshold is exceeded, this results in swapping the memory to disk, which of course will degrade performance.

                     

                    This testing has helped me tweak our capacity planning model (at least for this particular dashboard).  I've been testing on our TEST server, so I will look into scheduled restarts of the server (or services) in our PROD environment, then monitor as we move forward and see how it goes.