Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Why does Count(distinct) performe worse (here) than sum(counterField)

Hi Qlikcommunity,

Yesterday I came across something that surprised me…

I saw a .qvw that was performing _very_ slowly on 1 tab of the document.

After analyzing, I noticed that in 1 chart, there were multiple expressions using count(distinct fieldname), so I thought this can be done more efficient, using sum(counterfield) (where couterfield is a field in the datamodel loaded with 1 as counterfield).

Surprisingly when I cloned that table, updated all expressions (removing count(distinct) and replaced it with sum(counterfield)… The performance of the NEW chart object seemed even worse than the original one.

Does anybody have an idea why this could be the case?

The relevant part of the datamodel is like this

PaymentsTable (estimated? 50 mln rows?)

%relationKeyProduct AmountFlagNegative
1234
-51
1234
150
1235
100

RelationTable (about 2.5 mln records)

%relationKeyRelationCounter
12341
12351

----->note that with %relationKey these 2 tables are related in the datamodel

----->note also that %relationKey is unique in the relationstable but NOT unique in the payments table (one to many)

Original expression:

count(distinct {<FlagNegative={1}>}  %relationKey)

New expression:

sum({<FlagNegative={1}>}  RelationCounter)

My understanding was that count(distinct) forces Qlikview to work with only 1 CPU instead of all CPU's. Therefore I would have expected the "new expression" to be faster...

My questions

[1] - explanation on why new expression does NOT seem faster

[2] - hints on how I can optimize the front-end in another way...

7 Replies
Not applicable
Author

Anyone ideas on thisone?

Is at least the issue I mention clear?

chematos
Specialist II
Specialist II

About count and sum, usually its better to use sum so I have no explanation about your first question.

And about the second one, I have the same problem and anybody answered to me, may be with conditionals you could perform a little the objects...

Gysbert_Wassenaar

If I had to guess, I would say that count(distinct {<FlagNegative={1}>}  %relationKey) needs only one table to calculate the count, whereas sum({<FlagNegative={1}>}  RelationCounter) needs both tables. And count can probably use an internal index on %relationKey because of the compression qlikview does.

You could try adding a FlagNegative field to RelationTable and then try sum over that. Then you won't need the set analysis expression anymore so that should speed things up a bit.


talk is cheap, supply exceeds demand
Not applicable
Author

@Gysbert: Thanks for your reply.

I indeed think your explanation could be right, although I have also seen that the sum in 2 tables was faster than the count distinct.

Perhaps in this case the count has a relative small amount of records to count, causing the count(distinct) to be faster.

Unfortunately that it is difficult to predict. I liked the "clear and straight rule of the thumb" to always avoid (and replace) count(distinct).

About your 2nd remark: I don't see how I can add FlagNegative to the RelationTable, because this table contains all relations (unique) and the other table contains payments, where relationKeys exists multiple time (and flagNegative =1 when the payment is <0).

@José Mª Tos

What did you mean by your 2nd remark (add conditionals)?

Did you mean a calculation condition (so the object will only be calculated when certain selections are made)? Or other things?

chematos
Specialist II
Specialist II

Yes, the object will only be calculated when certain selections are made. I saw that your second table could be large because of the 2.5 minutes to load the data, may be is not interesting to show all the rows in the object.

I was really thinking more in this thread I started:

http://community.qlik.com/message/274216#274216

If you have any ideas, will be apreciated.

Regards

Vegar
MVP
MVP

Hi Roberto.

Roberto Postma wrote:

[...]

New expression:

sum({<FlagNegative={1}>}  RelationCounter)

[...]

My questions

[1] - explanation on why new expression does NOT seem faster

[2] - hints on how I can optimize the front-end in another way...

[2]: If the possible values for your FlagNegative is 1 and 0 OR 1 and NULL you can skip the SET and instead use

sum(RelationCounter * FlagNegative)

Best regards

Vegar Lie Arntsen

QlikView consultant at egbs consulting ab

Blog (in Swedish): bi-effekten.se

Not applicable
Author

Nice idea, I'll try that and see if it is faster.