We finally did yesterday. I've attached a post-Mortem done on the problem.
In summary, our QVS was referencing something that no longer existed. In our case there was an app that was no longer being used and was deleted. The jobs however were not deleted (I've spoken to my employees about these)
There were users attached to this app. Our I.T. cleaned out inactive A.D. account. Once the account was cleaned out, the problem started on our side.
Users are kicked out of Qlikview every couple of minutes
On January 26, 2011 at 7:22:55 a.m., the Qlikview Server Service (QVS) began restarting itself every 2 to 4 minutes. When the service restarted, it kicked out users from their application. We began by looking at the performance log. Below is a screenshot of the file showing when the issue arose
The reason that the server started doing this is because one of the configuration files appears to have been corrupted. QV crashed about 5 hours after the DW crashed. I'm pretty sure that the two events are NOT related. Our goal was to find the corrupt file and to restore it. Since there was no file name listed in the logs, this was going to be an arduous process.
We tried several options but I'll only describe the solution that worked.
We have three log files that are created from the QVS
· Performance.log looks at server usage: number of users, documents that are open, whether the services are up or not.
· Event.log looks at application activities that essentially tell us what is happening to the QV server.
· Session.log looks at user specific activities
We can set the verbosity at which Qlikview logs its activities. We normally set all of our logging to low or medium verbosity. We changed this to high/debug verbosity to get more background information on the issue at hand.
We then looked at the performance log. We use the server restart of 2011-01-26 10:09:35.
We take a look at the server's event viewer (shown below). We know that the server is restarting due to a calculated peak in memory since it restarts every time we hit about 4 gigs. The event view below confirms this by pointing us to the Working set limits.
We then go to the event log (shown below) and confirm that the server restarts are tied to the working set limits.
We confirmed that none of these settings have changed. We also confirmed that no other settings have changed (I sent a note to Network Engineering, Dev Support, Local QV support).
We go to the event viewer and look for any events that correlate to the working set events. I see a message that states that the QVS is trying to translate IPCFCDOM\lshands. We know that lshands no longer works at Simplexity. I initially dismissed this message since I could not find a logical reason why an A.D. account related to a former employee would cause my server to restart, but found that it preceded the working set message every time it showed up.
I then went to accesspoint (FCQVPROD01…\D:\Accesspoint\) and searched all of the configuration files for his name. Srini showed me an easy way to do this. Go to the command line and change your directory to d:\Accesspoint. Then type in the following command
find /c "lshands" *.qvw.meta
This command looks for text lshands in any files ending in *.qvw.meta in the d:\Accesspoint folder
If the app was decommed, I delete the *.qvw.meta files that had his name. If these were apps that were being used, I would delete and then redistribute via the QEMC to recreate new files.
I suspect that there was some sort of cleanup of ex-employees done in A.D. just before this issue started and the QVS restart after the DW crash simply brought this to light.