10 Replies Latest reply: Jan 6, 2014 4:08 AM by Aadil M RSS

    QV 11.2 SR2 QDS Issues

    Graeme Smith

      Hi,

       

      We recently upgraded from QV10 SR4 to QV11.2 SR2 and have noticed a lot of odd behaviour in the new QDS environment with regards to task scheduling, task execution, and task result status reporting in the QMC.  We have raised cases with QT support, but I just thought I'd see if anyone else was seeing similar issues.

       

      The main oddities we have noticed:

      1. Tasks not firing - I have observed some tasks which have simply not been executed at their scheduled time.  (E.g. could be a daily schedule that works fine for a week and then just misses a day).
      2. Tasks not reporting error status back to QMC correctly. E.g. task reaches timeout and then fails, but does not show up as failed in the QMC.  This makes it extremely difficult to manage the production environment.
      3. Intermittent and seemingly random COM Exception Errors.  We are seeing about 10 of these per day.  This problem is coupled with issue 2 above - i.e. the failed statuses are not always reported back.  Here is an example: QDSMain.Exceptions.DistributionFailedException: Distribute failed with errors to follow. ---> QDSMain.Exceptions.ReloadFailedException: Reload failed ---> QDSMain.Exceptions.LogBucketErrorException: The sourcedocument failed to reload.. Exception=System.Runtime.InteropServices.COMException (0x800706BE): The remote procedure call failed. (Exception from HRESULT: 0x800706BE)

       

      We had exactly the same configuration on QV10 in terms of task schedules and dependencies (i.e. the QV10 documents and tasks were migrated across "as is"), and we did not see any of these issues on the QV10 SR4 QDS.

       

      The new environment is clustered (although we are only running one hot QDS node currently), and the servers are 40 core / 256GB RAM. The max number of QDS engines has been set to 40, and the heap size has been increased as per QV Support's recommendations based on the hardware configuration.  The server doesn't appear to be resource bound.  It would be great to know if anyone else is experiencing any strange behaviour with QV11.2 SR2.

       

      Any feedback would great - even if it's just a "we're seeing no problems on version X". 

       

      Thanks,

       

      Graeme

        • Re: QV 11.2 SR2 QDS Issues
          Jochem Zwienenberg

          can you try to recreate a task, and see if this is running the correct way, for a period of time??

          It seems you have a big environment with problably a lot of tasks.

          Don't think you are waiting for this solution, but maybe you can tackle this way your problem.

           

          Better to wait for a answer from Qlikview so they can tackle this problem.

           


            • Re: QV 11.2 SR2 QDS Issues
              Graeme Smith

              Hi Jochem,

               

              All of the tasks were created in this environment via the QMS API (I wrote some code to migrate the tasks from our QV10 environment).  Interesting point though - we could try and create some tasks manually and see if they are impacted.  The difficult thing with these issues however is that they are intermittent, it's very difficult to tell if the new task has not been impacted because it is 'lucky', or if it is because creating it manually has actually fixed the problem.

               

              I really hope that is not the cause, as I don't really like the idea of creating a 1000+ tasks by hand - I think it would probably create more problems than it fixes!

               

              QV are investigating - we have a few cases open - but I just wanted to get some feedback from other users. 

               

              Thanks for your input.

               

              Regards,

               

              Graeme

            • Re: QV 11.2 SR2 QDS Issues
              Graeme Smith

              For anyone else experiencing similar types of symptoms:

               

              Issues 1 and 2 from my original post appear to have been getting caused by a combination of two issues -

              1. what look to be like bugs in the QDS clustering logic causing tasks to be "lost" and statuses to not be reported back when one of the QDS nodes in the cluster is on a warm standby (i.e. the server is a member of the cluster, but the QDS service is stopped).  QT are investigating, but since we removed the standby node from the cluster, the issues appear to have calmed down.
              2. QMS chunk size issue as outlined above.

               

              Issue 3 has been acknowledged as a bug and is currently under investigation by QT Support.

               

              It looks like there may be a potentially related bug fixed in 11.2 SR4, but I don't have enough information yet to say if it will fix the problems we were seeing or not:

               

              61946 Removing QDS from Cluster - Tasks Still Load Balance to Removed Node

               

              Thanks for your input gainkarthi