Qlik Community

QlikView Documents

Documents for QlikView related information.

Optimize Group By Performance

Optimize Group By Performance

One of our Qlik Community Buddy (Rupinder Singh) shared an awesome way to optimize the performance of group by statement but I don't think this got enough lime light which it should have. In order to give it better exposure, I am creating this document from this thread Optimize Group By

For below I am using QV version - 11.2 SR4

I have always had trouble with aggregating my data as "Group by" statements in scripts have always taken too long for my comfort.

For example, to group by  23 million records (2 columns - ID and Sales) it takes me 10 mins.

Table_Group:

Load ID,Sum(Sales)

Resident Table

Group By ID;

Instead of this, If I Order by my data by ID and then run the same Group by clause, I get results backs in less that 2 mins.

Table_Sort:

Load ID,Sales

Resident Table

Order by ID;

Table _Group:

NoConcatenate Load ID,Sum(Sales)

Resident Table_Sort

Group By Table_Sort;

I would have believed QV to internally perform this step, but explicitly implementing this has improved my query times by 500%.

I have tables with 40 columns (36 of which to be grouped by and 4 to be summed) and the query time has gone from 18 to 6 mins.

I have just ensured that my Order By column list is the same as Group By Column List.

So, If Order by is "Order By Column1,Column2,Column3" then Group by is "Group By Column1,Column2,Column3"

I would love to hear back if others in the community also have faced similar challenges and whether this or any other method has helped them improve Group by Performance.

Comments
johncaqc
Valued Contributor

Interesting. How long does it take to run the initial Order By portion of this?

I have not checked the individual component, but this is what I found

Only Group By - 15 minutes

Order By followed by Group By - 5 minutes

If you think you would want me to break it down, I can check the log files to see what the break up was, but seeing this improvement, I didn't even think it was necessary to check the Order By time vs Group By time

johncaqc
Valued Contributor

Not necessary Sunny, I can play and do some of my own analysis...I have some scenarios like this I should probably optimize this way. I find it very interesting though, and appreciate that I was reminded of this.

radoresky
Contributor

Hi,

I found this to be very interesting so I ran some tests on Qlik Sense November 2017. I used a 37 mil. dataset with 13 fields, 8 used in group by, 4 summed and 1 counted(distinct).

The Group By statement was really executed faster by almost a half, but the total time (with ordering the dataset first) was just a little less - 13:30 minutes instead of 15 minutes of just group by.

And, when I ran the two loads without the count(distinct) aggregation, the simple Group By was even faster then the total time with ordering first (10:10 instead of 10:30 minutes).

So I guess the more complex the aggregation (and more data), the more it makes sense to order first. And, there cannot be any other fields loaded, only those used in Group By or aggregated, because the ordering will actually take longer than what is "saved". And, any "Where" statement will kill the performance too.

YoussefBelloum
Esteemed Contributor

Very good ! I'll test it !

Remind me of another post related to loading performance

It's not just your old fashioned INLINE statement

Thanks for sharing this. I do use this technique a lot for doing qvw optimized loads.

mohsinqlik
New Contributor III

Just want to understand working behind less time.

Why does taking a resident of sorted data before doing actual group by takes less time.

Below load statement

Table_Group:

Load ID,Sum(Sales)

Resident Table

Group By ID

order by ID;


will be performing the same operation won't it ?

I don't know what logic it takes, but there are some questions that can only be answered by testing or if somebody from Qlik can answer them. Let's see what hic‌ thinks about this and if he can share some of his knowledge and expertise on this matter.

No.

As fas as I can see/find out, Order By in a Group By Load has NO effect.

Yes, I think that's strange too. But it would explain a lot of the mysteries surrounding this optimization.

Arjunarao
Honored Contributor II

Good documentstalwar1‌. I was also used this technique in my previous project.

Version history
Revision #:
1 of 1
Last update:
‎12-29-2017 09:44 AM
Updated by: