10 Replies Latest reply: Oct 20, 2015 8:08 AM by Tyler Waterfall RSS

    Operations Monitor Load Failing and taking an average of 35 Min when successful

    Tim Weisbrod

      I have a new Qlik Sense Server 2.1.1 installation that is only about a week old.

       

      I am seeing CPU spikes and load errors, that as near as I can tell are due to the Operations Monitor Load starting to take a very long time.

       

      From looking at the Operations_Monitor_Reload_Stats_2.1.txt file, it looks like the number of rows it was trying to import got successively larger and the load is completing successfully fewer and fewer times each day, and taking longer and longer to load when it does.  It looks like it is also causing other reload tasks to fail while it is running (which I'm guessing may be creating more log entries and making this problem even worse)

       

      I had some installation challenges based on how I was trying to securely configure this in an AWS VPC, but have those sorted out now.  Is this because of log files generated during this time?  I don't need the historical log files at this point (and would actually rather get them out of the Operations Monitor App.

       

      What's the best way to purge existing log data from the logs and the app?

       

      Also of note, is that the CPU spikes that tend to happen while it is loading start about 30 minutes into the load process.  I upgraded the server to a c4.4xlarge (16 vCPU & 30 GB of RAM), and it still manages to peg all 16 vCPUs.  In the Operations Management app when in the performance sheet, when I try to expand any of the hour rows, it spikes RAM usage and exhausts RAM before coming back and saying Error, Out of calculation memory.

       

      Thanks in advance for any assistance

       

      Performance Summary Pivot.png

       

      RAM Spike when try to expand performance Summary

       

       

      RAM Spike.png

       

       

      Most Recent Operations_Monitor_Reload_Stats_2.1.txt Entries

       

      2992015-10-11 00:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3002015-10-11 00:08:12INFOFD-QLIK-1Reload FinishReloaded at 2015-10-11 00:08:12 on fd-qlik-1 for 00:00:22 with 94,816 log entries.Operations Monitor
      3012015-10-11 01:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3022015-10-11 01:09:59INFOFD-QLIK-1Reload FinishReloaded at 2015-10-11 01:09:59 on fd-qlik-1 for 00:02:09 with 142,198 log entries.Operations Monitor
      3032015-10-11 02:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3042015-10-11 02:13:32INFOFD-QLIK-1Reload FinishReloaded at 2015-10-11 02:13:32 on fd-qlik-1 for 00:05:42 with 232,630 log entries.Operations Monitor
      3052015-10-11 03:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3062015-10-11 03:19:20INFOFD-QLIK-1Reload FinishReloaded at 2015-10-11 03:19:20 on fd-qlik-1 for 00:11:30 with 360,970 log entries.Operations Monitor
      3072015-10-12 13:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3082015-10-12 13:43:30INFOFD-QLIK-1Reload FinishReloaded at 2015-10-12 13:43:30 on fd-qlik-1 for 00:35:40 with 881,305 log entries.Operations Monitor
      3092015-10-12 14:07:50INFOFD-QLIK-1Reload StartReloading Operations Monitor 2.1 from FD-QLIK-1 running version 2.1.1+Build:22.origin/release/ms13Operations Monitor
      3102015-10-12 14:43:38INFOFD-QLIK-1Reload FinishReloaded at 2015-10-12 14:43:38 on fd-qlik-1 for 00:35:48 with 882,142 log entries.Operations Monitor
        • Re: Operations Monitor Load Failing and taking an average of 35 Min when successful
          Tyler Waterfall

          Tim,

          The increase in log entries certainly seems fast - from 94k in the morning to 360k in the afternoon - and 882k by the next day. The more log entries, the longer the reload will take - though it should scale better than it appears to be scaling in your case.

          The ramp up in reload duration and memory usage might also be related to an issue just reported late last week in which hung tasks (in that case user directory sync task) had started but were not terminated properly, resulting in many log entries but in an extremely long reload duration (days....).

          Can you check the Operations Monitor > Reload details page and post a screen shot of it?

          Mainly looking for duration, so if you can capture the max duration in the table on that page or also in the dropdown filter "Reload Duration".

          Tyler

            • Re: Operations Monitor Load Failing and taking an average of 35 Min when successful
              Tim Weisbrod

              I already replaced the app with an empty copy (following directions from you to another user in a different post), and purged the logs.  Right now things all look pretty good, but if this issue comes back again will definitely post.

               

              Also, not sure if this is the root cause or not, but it looks like the Archived Logs \ Script folder is growing at a very fast pace.  That was the largest log folder out there by far at about 868 MB.  I have 2 apps reloading now, one which reloads every 5 minutes and another that reloads every minute.  Since each file in this folder contains a copy of the reload script that was run, over and over, the logging seems a bit excessive.  For a server running frequent jobs like this, is there a way to reduce the logging volume created from reloads?  I looked under Repository settings in the QMC, but didn't see one that obviously mapped to reload scripsts.  Currently the first 2 are set to Basic, and all the rest are set to Info.

            • Re: Operations Monitor Load Failing and taking an average of 35 Min when successful
              Tyler Waterfall

              Follow-up question about the Performance detail chart - can you describe 'expand performance summary'? Is it just clicking on one of the "+"?

              • Re: Operations Monitor Load Failing and taking an average of 35 Min when successful
                Tyler Waterfall

                Concerning purging the logs - do the following:

                1. Move the folder(s) inside "qlik\sense\repository\archived logs" to some other location (I would keep them, just in case). For single-node deployment, you should just have the one folder named after the machine.
                2. Move all QVDs from the qlik\sense\log folder to the same backup place you moved the archived logs in #1.
                  Note - this might require some trickery if the QVDs get locked or you are not the user running the qlik sense services.
                3. Reload the Monitor apps. If you still have this issue, then you might want to remove some logs from qlik\sense\log\[service], but I doubt you will see that.

                 

                However you purge the old logs, though, I would be interested in knowing more about the log situation when you were having issues. Will help for identifying any issues with the logs and/or the logic used by the Operations Monitor. Thanks! (You can post them or mail them directly to me twa@qlik.com

                • Re: Operations Monitor Load Failing and taking an average of 35 Min when successful
                  Tyler Waterfall

                  Two follow-ups for you.

                  (Updating this comment Oct 20th!)

                  1 High RAM, lots of calendar records, and long reload:  this was due to a bug in the Operations Monitor script.  The fix is easy, but I'm not sure how quickly it will become available. If you'd like to do your own surgery on the Operations Monitor (PROCEED AT YOUR OWN RISK), you can fix it by:

                  1) Duplicate the Operations Monitor in QMC (so you can update it)

                  2) Open the Data Load Editor

                  3) Go to the "SUB reloadSummary" section of script

                  4) Replace ProxySessionId with _proxySessionPackage on line 42 ONLY s 7, 23, and 42

                  4) Replace the entire section of load script "SUB reloadSummary" with the attached script.

                  5) Publish your updated Ops Monitor app to the Monitoring Apps stream, replacing the existing Operations Monitor

                   

                  If you decide you don't like the update, you can re-import the original Operations Monitor from Qlik\Sense\Repository\DefaultApps.

                   

                  2 Performance Pivot table high RAM.  R&D is aware of this situation and is investigating ways to improve its performance. I have no more details on that.