Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
vlakeshr
Creator
Creator

Reduce the reload time within 1 hoour or 45 min

Reduce the reload time within tne hour.

load taking more time by sum of measures while grouping field, anybody have solution for this or  how can we reduce the  reload time which has taking for group by clause.

In Group by using 8 fields for 5 measures. I use below option but it's not working properly.

Anybody help me to reduce reload time by optimizing group by

Option 1:

Orders:

LOAD EmployeeID,

          Year,

          Sales,

          ....

FROM Orders;

Left Join (Orders)

LOAD EmployeeID,

    Year,

    Sum(Sales)

Resident Orders

Group By EmployeeID, Year;

8 Replies
vlakeshr
Creator
Creator
Author

Please note that Data Volume is 20 Million

Kushal_Chawda

Perform grouping only on incremental data

Anonymous
Not applicable

Hi,

20Million it doesn't seem that much. Where are you extracting that data from? Are you sure that you do not have a network performance problem?

Besides that, do you really need to group by Year? I don't know your requirements, but personally I don't see the need of that. Sum is a quite fast operation in frontend. So if you already have your information group by Employee, it should be fast to get the year sum of that data.

Please clarify your requirement and explain better how are you extracting the data so people can help in a more focused way.

Kind Regards.

Not applicable

Following..

vlakeshr
Creator
Creator
Author

Data volume is 200 Million ..Sry It's not 20 million..

Data loading fast from QVD and resident load but it's taking much time for sum of measures ( 6 Fields) using group by

( 8 fields), takes over 30 minutes and use of 90% of the available memory (group by)

We have to add a calculation on the date, because orders picked till 3 o'clock next day have to record on current day.

This will take even more time, what will increase the time out of the limits of the timeframe.

Has anyone an idea where to change and where to add the calculation for best practice.

- build up the where exist() statements ?

-- speed up group by?

I expect there will be multiple "little" solutions so every idea is helpful


No need for full new script, just tips and tricks then i will figure it out

marcus_sommer

200 million records are quite a lot f data for an aggregation-load with 6 measures and over 8 fields. This won't be never fast and your load-times from 30 - 60 minutes isn't really bad. You might save time if you could remove some of the fields, keeping only one/two key-fields and adding the other fields afterwards per joining/mapping. Also a vertical and/or horizontal split of the task and adding them later again might be useful especially if your system runs out of memory and starts swapping with the virtual RAM. Another point which might reduce the load-times of this task is if the data are already sorted by loading them.

But before playing with a varity of rather small improvements I suggest to use an incremental approach for loading and transforming the data. Here you will find various posting to this topic and how data could be loaded in a optimized mode: Advanced topics for creating a qlik datamodel.

- Marcus

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Try with this code, to see whether and how much this reduces the time needed to calculate the two tables. Check with 100mio or even 25mio rows:

Orders:

LOAD EmployeeID,

     Year,

     EmployeeID & '-' & Year AS EmployeeYear,

     Sales,

     ....

FROM Orders;

EmployeeSales: // Leave aggr. data in separate table, no join

NOCONCATENATE

LOAD EmployeeYear,

     Sum(Sales) AS Sales

RESIDENT Orders

GROUP BY EmployeeYear;

Anonymous
Not applicable

Just like Peter, I would suggest to create an Id for those 8 fields that you need to aggregate. Create it with Autonumber, so it can perform faster, and then use it in the aggregation. Later you can Left Join it to you initial table, or use that ID field to connect both tables.

Either way you will get a lot of different ID, and like it was said before 200mio rows will always take time to aggregate. Be sure to aggregate to the minimum fields possible.