May be this wont be your answer, but this i told you as a preliminary check.
Anyways the performance issue may occur due to many a reason, like Improper Data Modeling, improper use of the expression at chart level.
To make sure that everything is ok at chart level you can check the memory statistics of an application.
For this you can open the application -> settings -> Document properties -> General Tab -> Click on Memory Statistics.
Save the file.
Open the new application and use this .mem file for reloading data.
Once you reload data from this file you will see the object id and their avg calculation time.
This will help you to identify if you have any issue in chart level.
My background is in Performance Management not QlikView so I'll take those recomendations back to the team that built the Documents.
However I am confident I've identified the source of the Performance Issues as being due to the high use of the Paging File. What I'm trying to determine is what to do about that - as I see it I have two options.
1) Add more memory
2) Use the existing memory more efficiently.
Before I start digging in to the efficiency of the Documents I think I need to ascertain whether the QlikView settings are optimal.
As you have read in the PDF, the low working set means that QlikView will use up to that memory without asking to the OS. The high means that it will start swapping, but the Events log in the QlikView logs folder should log that there is some documents using more virtual memory than expected.
How is the pagefile size managed: is it a manual value or is left to the OS? Because it may be conflicting with the 5+ GB that you have left for the OS and the rest of applications in the high limit being at 97%. Note that the working set is a Windows OS feature (http://msdn.microsoft.com/en-us/library/windows/desktop/cc441804(v=vs.85).aspx), and it controls QVS.exe, but not the rest of QlikView services that may be running in the same computer, along with antivirus software, backups, etc.
Check the Windows Event Viewer and the QlikView Events log (use the System Monitor to help you that: http://community.qlik.com/docs/DOC-4307) to see when this happens and when the OS starts paging.
Hope that helps.
Thanx for clarifying that. I guess that rather clearing the cache QVS is Paging to disk; when the QCS process reaches 176000MB is when Paging occurs which is 90% of 192GB.
The Paging File is 16384MB, and the peak usage I've seen on this has been 40%. That's equivalent to about 4% of the RAM that QVS using. Is that a significant amount or pretty typical when the Working Set for QVS reaches it's Min value?
I summise that if I reduce the Working Set Min / Max values it won't necessarily prevent Paging from occuring it will just Page sooner (in relation to total Memory Usage).
btw I've had a look in the "Events" log and all it's telling me is the same couple of errors occuring a couple of dozen times a day:
2013-07-15 10:39:19 2013-07-15 18:12:12 2 500 Warning Document Load: The document \\HBEU.ADROOT.HSBC\DFSROOT\GB002\COREP_MI_QLIKVIEW\QLIKVIEW FILES\Fermat\FermatQlikViewApp\Variance Report R7a.qvw failed to load because of no file access .
2013-07-15 10:39:19 2013-07-15 19:48:17 2 500 Warning Session Recovery Failed: The document QlikView Files/Fermat/FermatQlikViewApp/June13 Month End Report R7a.qvw failed to apply $LASTKNOWNSTATE
...nothing about Memory. There's a couple of fields called "Bytes Sent" and "Bytes Received" in the Sessions logs but I'm not sure what to make of them.
Just to throw in my two cents. It's important to understand that Qlikview Server will never deliver a good user experience if the QVS task is paging at all. It all has to live in RAM to give good performance. So as you've stated you need to either increase available RAM or decrease the RAM demand of the apps.
Confirm a couple of things --
1. Are there apps other than QV running on this box?
2. Are reloads being done on this box or another server?
To answer your original question. If you reduce the WS Max, you will begin paging sooner. If you reduce the WS Min, you should not begin paging sooner. However, you will begin cache trimming sooner, which may be a good thing but probably won't resolve your overall problem.
If you analyze the QV documents, you may find that one or two are contributing to the bulk of the RAM usage and those would be the best tuning candidates.
Thanx for your reply. It's actually a clustered environment. There's 2 load balanced servers acting as the QVS servers, and one of these acts as the web server too and those are the ones with the occasional high paging. There's a 3rd server (I forget the description) but that does all the other stuff like reloads etc.
You're confirming what I had considered, there's a reason why the data is in memory so if we start trimming sooner then it could just change the nature of the problem if the users need to recall or recreate the data that was trimmed.
As I say my backgrond is performance management, not QlikView so I want to eliminated environmental / config constraints before I go back to the developers and start looking at the documents.
Once again thanx.
I'm trying to determine why those values were chosen by the build team, but I read in the technical document:
"For servers with large RAM (i.e. 256 GB of RAM) these settings can be changed to allocate a couple of gigabytes of RAM for the operating system and allow the remaining RAM to be used by the QlikView Server."
I'd consider 192GB to be large so does the 70 90 rule still apply? The recomendations I've seen in the documentation all sound a little vague to me.
Hellow Shane, i have the same problem. Did you get any conclusion? is there any qliktech suggestion about the configuration of pagefile?
I think that you have detailed that qvs is swapping at wsl threshold, this is not working as designed (in the theory i know) and i think i have the same behaviour on qvs 11.2 sr5.
Hi David, our issue was that we were on v10 r4 and were experiencing memory leaks - in brief after a reload a second document remained in memory and could not be flushed hence when we hit wsl we started paging. We've upgraded to v11 sr2 and memory leaks are much less frequent though they sometimes do occur. Some people will tell you that QlikView does not have memory leaks but they're mistaken, unfortunately many people report normal behaviour as memory leaks because they don't understand qlikview which gives those who claim Qlikview does not experience memory leaks credence.
The chances are though that you just need to configure your environment and are not experiencing a true issue with QlikView, so the onus is on you to do some analysis and tweaking. You need to start monitoring QVS process to see how it grows and understand if you've got enough RAM for all your documents - they'll be 4.25x as big in Memory as on disk. Then you'll need more RAM for cached results so you need to understand how quickly that gets depleted with user activity. Depending on the size of your environment you may wish to alter the working set low - we're currently at 92% and 96% but with 288GB of RAM. This allows about 22GB of RAM for processes other than QVS. This was a figure arrived at after analysis of usage, not guess work.
What's important is that you understand what else is consuming memory. If publisher is on the same server then QVB process(es) will be directly competing for RAM with QVS and paging will occur long before hitting working set low. If that's the case you may wish to have a separate publisher.
You should monitor QVB process(es) sizes when tasks are running and see if they are consuming more RAM then the server has got hence use Pagefile. We have a process that runs once a month that consumers about 120GB and the server is 100GB but as we have a separate Publisher and it's only once a month for 2 hours it's not worth investing in more hardware.
Take a look at these 2 documents on how to use Perfmon to monitor QlikView usage:
n.b. There's no configuring of Pagefile that you need to do, you need to configure your environment to avoid using it if possible,
Thank you very much Shane, you have made a great analysis and tuning job. And thank you so much for sharing it with us.
I've also seen that some times our memory leaks were caused by a big distribution, we have some gigantic documents, and i have leave between the wsl and the wsh enough RAM free in order to accommodate the new version "without pagin".
We have two dedicated publisher and twelve qvs/qvws distributed in three cluster in our production environment, so qvb is not competing for the resources.
The behaviour that i have seen is that qvs begins writting working set beyond parameters, it stays days in that state and it also stops working.
The theory we know, is that once the wsl is reached, qvs begins to remove cached aggregation and objects from RAM in order to decrease the ws below the threshold. Also in theory the OS is asked not to page below this level.
We have seen at qvs logs that it stay's with that message for days, and with the data of the qvs performance log, it seems that it doesn't remove anything from ram. So i think this could a be product bug still in 11.2 sr5: not to flush cached aggregation correctly and not pagin under wsh.
You have explained very accurately the moment that you saw that qvs begins swapping i don't know how you did it.. I don't have experience analizing memory consumption on windows. Could you help me about this?
I don't understand rightly the metrics that qlikview shows, i think they doesn't match with the OS statistics. For example the value VMCommitted that has this definition at qmc help
"Size in MB of virtual memory actually used by QlikView Server at the end of the interval. This number is part of VMAllocated(MB) and should not exceed the size of the physical memory in order to avoid unacceptable response times." doesnt match with the workingset value of the process at resource monitor (maybe because this is just phisycal memory), but i have also see that perfmon the \Paging File(*)\% Usage performance counter is too low to be the other part of that value. I don't know how see the swapping use correctly.
I also think that one of our problems is that we have system managed page file, and it has the same size that the RAM. That's why i'm looking for a vendor guidelines on how to configure this setting, i have only a v10 pdf that indicates 150% of RAM size, and i also think that QlikView is an in memory product that must live only in RAM. So what size have you leave to the pagefile? i was thinking about 8gb's for S.O.
Thank you very much,
From my experience the difference between wsl and wsh has no relevance. It's the wsl figure that is key.
We have an application that reloads every night. When the reload is complete the new versions gets distributed then loaded in Memory to replace the old version. However more often than not the older versions does not drop out of memory (even though only allow 1 version in memory is enabled). i.e.
This extra document cannot be flushed from cache like the "cached results" therefore the system starts paging when we reach wsl. We've got a case open with Qlik since January, we were able to prove this was a genuine memory leak (as the only system activity is reload at this time) and we provided them with a document so they were able to recreate the issue, however they still have come up with a fix. Therefore we schedule regular restarts of QVS to avoid running out of memory.
My advice would be to ignore the QlikView Performance Logs and instead set up windows perfmon monitoring as per above links. Collect and look at the following 3:
\Paging File(_Total)\% Usage
When Paging % Usage starts growing you want to compare that to how many Avail MBytes you have left and the size of QVS in Private Bytes. If you've run out of 'Avail MBytes' when paging occurs then your WSL us too high, if you've got plenty (i.e. more than 5GB) spare then adjust your WSL upwards.
n.b. The '% Committed Bytes In Use' metrics combines RAM and Paging file so if you don't understand this and know the size of your paging file it can be misleading.
We also have system managed page file, this is the best way to go. However what you want to do is avoid using paging file totally for QlikVIew (and not worry about how big it is set to). You really need to understand your application and how it consumes memory over time - for example our 10-12 documents have a base footprint of 80-100GB and at peak times we consumer 40GB an hour with cached results etc and when we reach WSL Qlikview becomes unpredictable so we have cache flushes and recycles to avoid hitting that point.
If you've got a large document (in Memory) you don't want to be at wsl when it has finished reloading. QlikView does not seem to cope well with loading a document int o Memory and flushing cache at the same time.
In our case the difference is relevant because of our XL documents.
Our behaviour is a little bit different than yours, memory doesn´t grows so qickly, at least at performance log. I have also a case opened since June, i will told you if i get something interesting and please let me know if they give you a fix.
Thank you very much for the explanation of the perfmon, yesterday i also have been studing the get-process and finally i focus on the PagedMemorySize64 and PagedSystemMemorySize64 properties. I will put the eye also on the metrics you say.
Just one question, when you say "we have cache flushes and recycles to avoid hitting that point" what you exactly mean? just restarting qvs or is there a best way?
There's a line you can put in the settings.ini to flush cache.
Update the "Within the [Settings 7] configuration section" with the entry:
Qlik don't recommend a frequency of more than 3/4 and to be honest I would avoid using it. What you can do instead is recycle QVS using a bat file trigger by a windows scheduled task. Copy and paste the below in to a ".bat" file:
sc stop QlikViewServer
ping 127.0.0.1 -n 120 -w 1000 > nul
sc query QlikViewServer | find /i "STATE" | find "STOPPED"
if errorlevel 1 goto :stop
net start | find /i "QlikViewServer">nul && goto :loop
sc start QlikViewServer
The clear cache setting is what Qlik refer to as an Easter Egg, it's additional functionality that's not supported. You cannot set times to clear cache just frequency. i.e. 1 = Midnight GMT, 2 = Midnight & Midday GMT, 3 = Midnight, 8am and 4pm GMT, etc.
It works well at midnight when the system has no users, but is less successful at midday when it's busy. I suspect that it may also be causing instability so I'm looking at moving away from it as an option. I suggest you carefully monitor for issues if you do implement it.
You are on the right path.
In my world i have the following challenge. "User behavior" and poor design of the apps.
We have flexible tables (poor design: no restriction to how much data you can load and/or export).
Analyst love their excel an they want dump data from QV Apps to Excel.
Initially it was 100,000 records and it grew 1 Million records and more. Everyone wasting their time creating their own reports then validating each others reports to find out if the data is accurate. What a wast of time and resources.
We are educating our consumers to use the online dashboards and reports to do their analysis on the fly instead of downloading to excel.
Memory usage and recycling used memory has been a challenge.
We have made the following changes.
- Document Timeout is set to 30 mins
- Performance tuning
- Stop and Start QVS daily 1's day.
- Dump cached memory every 3 hours (stating from Midnight)
These steps would lead to better memory management. We have also performed hardware upgrade moving from 256 GB to 512 GB cluster model with publish on a different server. We still suffer business disruption because of user behavior. We have done as much as possible from Server Admin position.
We are looking at Audit tables to tell us, who are these individuals who still dumping large sets of data to txt or excel.
Then we will be having a conversation with these individuals. Why they are not following the best practices model. Do they have valid business requirements that does not meet their needs.
Since this topic was closed we've moved to a new environment with 2x 768GB QVS Servers. We still do regular restarted of QVS but we've stopped doing the Clear Cache as it seemed to lead to instability of QVS if users were on the system. Most importantly though after butting heads with the business for months we got someone in from Qlik to review the Documents who basically said the same thing we'd been saying about poor design and now they've redesigned the Documents at long last.