I forgot to update the post. Since i posted i was investigating and figured out a couple of things:
-We use RDP to access the virtual machine QV server is installed on and there we view management console using IE
-This popup message seems to be just the symptom. When the break happens management console is open and it is on status tab with automatic refresh of task list checked. So i figure that this is just IE popup for a refresh that failed due to network error. So the message is only displayed when the management console is open. Also reloads dont resume until we click OK but we started closing management console whenever we dont use it so reloads resume without OK but...
-When this break happens our RDP access to this virtual machine is severed. No one can RDP anymore until we open console in vSphere client and log on there. After that RDP is enabled again and only then reloads resume. (If this break happens after working hours no reloads until someone logs on using console). Webserver service still has to be restarted for access point to work (although it is in status running).
-When one of these breaks happens more are soon to folow. We restart the virtual machine server and then for some time breaks dont happen. And then it happens again and keeps happening and we restart again...and so on.
-We are in the middle of an experiment. We are first trying to exclude reloads as a possible cause. So we disabled all reloads for a period of 2 days. No breaks happened. Now we are trying to pinpoint reload source that could be causing the problem. We reload documents by QV reload engine and by windows sheduled tasks. So at this stage all reloads are swiched to scheduled tasks for the remainder of the week. Next week we will swich all to QV reload engine and compare the results.
forgot to update again. Well we have found, if not a solution than a workaround and our system is stable for now.
First: there was no difference reloading through reload engine or through scheduled task. Breaks still happened.
The closest thing we could find to a cause was the lack of system resources. We have tried to replicate the break and have noticed that there seems to be a memory leak somewhere. So we would RDP to the QV server, open task manager and look at the memory and CPU usage while we reload and edit documents(users from the whole company could still access documents from web server service running on that server but this testing was usually after working hours so few of them were active. So their contribution to resource spending was minimal). And we noticed that during the reloads CPU and memory usage spiked and memory usage increased by about 30-40 percent. And every time reload finished there was a little more memory used (cached) by QlikView application. So little by little memory used by QV increased and when it got to about 70 percent and reload got it to 99 percent - break happened! Similar thing happened with CPU usage. During reload CPU spiked to 99 percent briefly. If we did 2 reloads at the same time CPU would spike (memory also but CPU first so we identify it as a cause) - break happened! If reload was in progress and we had developer opened and were changing tabs or sorting data in document (using CPU) - break happened! So this is what we think is the cause...and in practice it showed plausible. So on Monday we restart the server and memory usage starts at 20 percent. Then on next Monday it gets to 70 percent and we are in danger zone. Usually Monday night or Tuesday morning break happens.
Our solution(workaround): Of course adding more resources helps, but management decided against that(although we tested it and it helps). Next thing we did is changed a default setting in management console: System tab->QlikView Servers->Performance tab Working Set options set to 50 low and 70 high(as i understand this limits memory caching of QV app). I wanted to decrease it further but this seems sufficient for now. Lastly (since the above helped but didn't remove the breaks completely) we scheduled a server reset every Sunday night so that every Monday memory usage starts at 20 percent. Also we never reload 2 documents at the same time.
So now we have a working server although it is reseted every week. Without reset it now lasts about 3 weeks before break happens.
Now we are working on optimizing our QV documents making them smaller and doing as little calculating as possible (letting the oracle DB do the calculations, procedures instead of mappings...) because i think this too is the cause of high CPU and memory usage during reloads and when that is done we will probably suspend the server resets and try again. I also think memory leak is real and it is caused by V12 QV (in V10 we had no problems) and maybe some patch in the future will address it.
Hope this helps.