Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
rupindo_barcap
Contributor II
Contributor II

Optimize Group By

For below I am using QV version - 11.2 SR4

I have always had trouble with aggregating my data as "Group by" statements in scripts have always taken too long for my comfort.

For example, to group by  23 million records (2 columns - ID and Sales) it takes me 10 mins.

Table_Group:

Load ID,Sum(Sales)

Resident Table

Group By ID;

Instead of this, If I Order by my data by ID and then run the same Group by clause, I get results backs in less that 2 mins.

Table_Sort:

Load ID,Sales

Resident Table

Order by ID;

Table _Group:

NoConcatenate Load ID,Sum(Sales)

Resident Table_Sort

Group By Table_Sort;

I would have believed QV to internally perform this step, but explicitly implementing this has improved my query times by 500%.

I have tables with 40 columns (36 of which to be grouped by and 4 to be summed) and the query time has gone from 18 to 6 mins.

I have just ensured that my Order By column list is the same as Group By Column List.

So, If Order by is "Order By Column1,Column2,Column3" then Group by is "Group By Column1,Column2,Column3"

I would love to hear back if others in the community also have faced similar challenges and whether this or any other method has helped them improve Group by Performance.

18 Replies
HirisH_V7
Master
Master

Excellent Will give a try over this.
HirisH
“Aspire to Inspire before we Expire!”
HirisH_V7
Master
Master

Folks i am trying to achieve this with my group by, but i am getting this error. I don't understand, if its working there. 😞

@sunny_talwar  @rupindo_barcap  @jonopitchford  Pl help me.

Capture.JPG

 

HirisH
“Aspire to Inspire before we Expire!”
jonopitchford
Contributor III
Contributor III

Try group by ID
it looks like they have accidentally grouped by the table name and not the field name
sunny_talwar

I think you are grouping by Table_Sort whereas you might want to Group By ID 🙂

HirisH_V7
Master
Master

Thanks!

But Post above directed me such way, bit confused even when i am trying. Anyway i will check performance this way.

HirisH
“Aspire to Inspire before we Expire!”
HirisH_V7
Master
Master

Many Thanks! It's actually working.

My reload time was decreased up-to some extent, anyway i will test it on huge data sets as well. As per my understanding in this procedure the order by is doing the required alignment of data for group by, which actually helps Group by to work on spontaneously with every unique data field/no. Beside checking for field/no to do grouping on-every individual occurrence.

Thanks,
HirisH

HirisH
“Aspire to Inspire before we Expire!”
pratyushraizada
Contributor II
Contributor II

I'm using Qlik sense 13.42.1.

I think Qlik has now optimized this. In case we prefix the 'group by' with 'order by' operation, the data model takes a few seconds more to load.

Saravanan_Desingh

Thanks for sharing this.

basav
Creator
Creator

Hi All, 

I tested this technique on a 173 million records data, Without Orderby Sort, it took 10 min for one aggregation in groupby. 

With using order by before group by it actually took 4min to load the 173 million data. 

 

However,  There is no changes on overall app run time, it only optimized group by performance.  Because whatever the time we save for groupby, orderby will eat that time. So overall app run time has no improvement.

Still, very good and useful technique. Thank you.