8 Replies Latest reply: Jun 27, 2017 9:46 AM by Bruno Silva RSS

    Reduce the reload time within 1 hoour or 45 min

    vinayak lakeshri

      Reduce the reload time within tne hour.

      load taking more time by sum of measures while grouping field, anybody have solution for this or  how can we reduce the  reload time which has taking for group by clause.

      In Group by using 8 fields for 5 measures. I use below option but it's not working properly.

      Anybody help me to reduce reload time by optimizing group by

       

      Option 1:

       

      Orders:

      LOAD EmployeeID,

                Year,

                Sales,

                ....

      FROM Orders;

       

      Left Join (Orders)

      LOAD EmployeeID,

          Year,

          Sum(Sales)

      Resident Orders

      Group By EmployeeID, Year;

        • Re: Reduce the reload time within 1 hoour or 45 min
          vinayak lakeshri

          Please note that Data Volume is 20 Million

            • Re: Reduce the reload time within 1 hoour or 45 min
              Bruno Silva

              Hi,

               

              20Million it doesn't seem that much. Where are you extracting that data from? Are you sure that you do not have a network performance problem?

              Besides that, do you really need to group by Year? I don't know your requirements, but personally I don't see the need of that. Sum is a quite fast operation in frontend. So if you already have your information group by Employee, it should be fast to get the year sum of that data.

               

              Please clarify your requirement and explain better how are you extracting the data so people can help in a more focused way.

               

              Kind Regards.

            • Re: Reduce the reload time within 1 hoour or 45 min
              kushal chawda

              Perform grouping only on incremental data

                • Re: Reduce the reload time within 1 hoour or 45 min
                  vinayak lakeshri

                  Data volume is 200 Million ..Sry It's not 20 million..

                  Data loading fast from QVD and resident load but it's taking much time for sum of measures ( 6 Fields) using group by

                  ( 8 fields), takes over 30 minutes and use of 90% of the available memory (group by)

                  We have to add a calculation on the date, because orders picked till 3 o'clock next day have to record on current day.

                  This will take even more time, what will increase the time out of the limits of the timeframe.

                   

                  Has anyone an idea where to change and where to add the calculation for best practice.

                  - build up the where exist() statements ?

                  -- speed up group by?

                   

                  I expect there will be multiple "little" solutions so every idea is helpful


                  No need for full new script, just tips and tricks then i will figure it out

                    • Re: Reduce the reload time within 1 hoour or 45 min
                      Marcus Sommer

                      200 million records are quite a lot f data for an aggregation-load with 6 measures and over 8 fields. This won't be never fast and your load-times from 30 - 60 minutes isn't really bad. You might save time if you could remove some of the fields, keeping only one/two key-fields and adding the other fields afterwards per joining/mapping. Also a vertical and/or horizontal split of the task and adding them later again might be useful especially if your system runs out of memory and starts swapping with the virtual RAM. Another point which might reduce the load-times of this task is if the data are already sorted by loading them.

                       

                      But before playing with a varity of rather small improvements I suggest to use an incremental approach for loading and transforming the data. Here you will find various posting to this topic and how data could be loaded in a optimized mode: Advanced topics for creating a qlik datamodel.

                       

                      - Marcus

                  • Re: Reduce the reload time within 1 hoour or 45 min
                    Peter Cammaert

                    Try with this code, to see whether and how much this reduces the time needed to calculate the two tables. Check with 100mio or even 25mio rows:

                     

                    Orders:

                    LOAD EmployeeID,

                         Year,

                         EmployeeID & '-' & Year AS EmployeeYear,

                         Sales,

                         ....

                    FROM Orders;

                     

                    EmployeeSales: // Leave aggr. data in separate table, no join

                    NOCONCATENATE

                    LOAD EmployeeYear,

                         Sum(Sales) AS Sales

                    RESIDENT Orders

                    GROUP BY EmployeeYear;

                    • Re: Reduce the reload time within 1 hoour or 45 min
                      Bruno Silva

                      Just like Peter, I would suggest to create an Id for those 8 fields that you need to aggregate. Create it with Autonumber, so it can perform faster, and then use it in the aggregation. Later you can Left Join it to you initial table, or use that ID field to connect both tables.

                       

                      Either way you will get a lot of different ID, and like it was said before 200mio rows will always take time to aggregate. Be sure to aggregate to the minimum fields possible.