Skip to main content
Announcements
Document boards are being consolidated, this board no longer allows NEW documents READ MORE

Optimize Group By Performance

cancel
Showing results for 
Search instead for 
Did you mean: 
sunny_talwar

Optimize Group By Performance

Last Update:

Sep 21, 2022 5:11:51 PM

Updated By:

Sue_Macaluso

Created date:

Dec 29, 2017 9:44:36 AM

One of our Qlik Community Buddy (Rupinder Singh) shared an awesome way to optimize the performance of group by statement but I don't think this got enough lime light which it should have. In order to give it better exposure, I am creating this document from this thread Optimize Group By

For below I am using QV version - 11.2 SR4

I have always had trouble with aggregating my data as "Group by" statements in scripts have always taken too long for my comfort.

For example, to group by  23 million records (2 columns - ID and Sales) it takes me 10 mins.

Table_Group:

Load ID,Sum(Sales)

Resident Table

Group By ID;

Instead of this, If I Order by my data by ID and then run the same Group by clause, I get results backs in less that 2 mins.

Table_Sort:

Load ID,Sales

Resident Table

Order by ID;

Table _Group:

NoConcatenate Load ID,Sum(Sales)

Resident Table_Sort

Group By Table_Sort;

I would have believed QV to internally perform this step, but explicitly implementing this has improved my query times by 500%.

I have tables with 40 columns (36 of which to be grouped by and 4 to be summed) and the query time has gone from 18 to 6 mins.

I have just ensured that my Order By column list is the same as Group By Column List.

So, If Order by is "Order By Column1,Column2,Column3" then Group by is "Group By Column1,Column2,Column3"

I would love to hear back if others in the community also have faced similar challenges and whether this or any other method has helped them improve Group by Performance.

Comments
prieper
Master II
Master II

Just checked on a "living" example: Aggregation of some 18 fields over 6 dimensions out of 7.8 Mio Records into a new table with 7 Mio Records (admittingly not a huge summarization, but needed for some datacleansing): actually it took 24:47 instead of 23:54 with the initial file.

So the answer may be: depends ...

0 Likes
vvvvvvizard
Partner - Specialist
Partner - Specialist

Excellent tip

0 Likes
YoussefBelloum
Champion
Champion

Please give it a try, because opinions are really mixed regarding its effectiveness.

when you have time, please share with us your benchmarking

0 Likes
dvickers
Contributor III
Contributor III

This is an excellent tip, going to try this out on a table we need to group by where the lowest grain is 35M rows.  Do you know why sorting the data first like this in Qlik is the reason why script performs better? 

0 Likes
Cory
Contributor II
Contributor II

WOW - thank you - working with a data set of 4.5M rows, I had a section that was taking 100-110 seconds to load with just a group by.  When I followed the above suggestion it dropped to about 15 seconds. 

0 Likes
basav
Creator
Creator

Hi All, 

I tested this technique on a 173 million records data, Without Orderby, it took 10 min for one aggregation in groupby.  And Final Run of the overall app was 10min

With using order by before group by it actually took 4min to load Group by table with one aggregation of  173 million rows.  And Final App run time is still same 10min. 

There is no changes on overall app run time, it only optimized group by performance.  Because whatever the time we save for groupby, orderby will eat that time. So overall app run time has no improvement.

Still, very good and useful technique. Thank you. 

0 Likes
Version history
Last update:
‎2022-09-21 05:11 PM
Updated by: