Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
rupindo_barcap
Contributor II
Contributor II

Optimize Group By

For below I am using QV version - 11.2 SR4

I have always had trouble with aggregating my data as "Group by" statements in scripts have always taken too long for my comfort.

For example, to group by  23 million records (2 columns - ID and Sales) it takes me 10 mins.

Table_Group:

Load ID,Sum(Sales)

Resident Table

Group By ID;

Instead of this, If I Order by my data by ID and then run the same Group by clause, I get results backs in less that 2 mins.

Table_Sort:

Load ID,Sales

Resident Table

Order by ID;

Table _Group:

NoConcatenate Load ID,Sum(Sales)

Resident Table_Sort

Group By Table_Sort;

I would have believed QV to internally perform this step, but explicitly implementing this has improved my query times by 500%.

I have tables with 40 columns (36 of which to be grouped by and 4 to be summed) and the query time has gone from 18 to 6 mins.

I have just ensured that my Order By column list is the same as Group By Column List.

So, If Order by is "Order By Column1,Column2,Column3" then Group by is "Group By Column1,Column2,Column3"

I would love to hear back if others in the community also have faced similar challenges and whether this or any other method has helped them improve Group by Performance.

18 Replies
Not applicable

We are running into the same problem with our group bys. We are attempting to get the max date at each account we have for a massive fact table. We added a load resident and an order by before the group by, but it is hard to tell if the order by is improving our run time.hic‌‌ Any insight or better way to load and group data?

DavidFoster1
Specialist
Specialist

Rupinder

Do your timings include the time to perform the sort, or just the group by?

rupindo_barcap
Contributor II
Contributor II
Author

Both Sort and Group By

liam_hanninen
Creator
Creator

This improved my reload time by about 55% in one instance with 25 columns and one group by/order by field. Thanks!

felipedl
Partner - Specialist III
Partner - Specialist III

Awsome strategy Rupinder, just had the same problem and this solution gave a huge performance improvement.

Glad some tried it and helpfully posted it here.

Many thanks!!!!

Felipe.

sunny_talwar

This did work... I didn't think it would, but it did... thanks for showing this.

Best,

Sunny

DavidFoster1
Specialist
Specialist

What does grouping by the table name do?

jonopitchford
Contributor III
Contributor III

Amazing

Anonymous
Not applicable

Thanks Rupinder