Why does Count(distinct) performe worse (here) tha... - Qlik Community

Report Inappropriate Content · ‎2012-10-25

Hi Qlikcommunity,

Yesterday I came across something that surprised me…

I saw a .qvw that was performing _very_ slowly on 1 tab of the document.

After analyzing, I noticed that in 1 chart, there were multiple expressions using count(distinct fieldname), so I thought this can be done more efficient, using sum(counterfield) (where couterfield is a field in the datamodel loaded with 1 as counterfield).

Surprisingly when I cloned that table, updated all expressions (removing count(distinct) and replaced it with sum(counterfield)… The performance of the NEW chart object seemed even worse than the original one.

Does anybody have an idea why this could be the case?

The relevant part of the datamodel is like this

PaymentsTable (estimated? 50 mln rows?)

%relationKey	Amount	FlagNegative
1234	-5	1
1234	15	0
1235	10	0

RelationTable (about 2.5 mln records)

%relationKey	RelationCounter
1234	1
1235	1

----->note that with %relationKey these 2 tables are related in the datamodel

----->note also that %relationKey is unique in the relationstable but NOT unique in the payments table (one to many)

Original expression:

count(distinct {<FlagNegative={1}>} %relationKey)

New expression:

sum({<FlagNegative={1}>} RelationCounter)

My understanding was that count(distinct) forces Qlikview to work with only 1 CPU instead of all CPU's. Therefore I would have expected the "new expression" to be faster...

My questions

[1] - explanation on why new expression does NOT seem faster

[2] - hints on how I can optimize the front-end in another way...

Report Inappropriate Content · ‎2012-10-29

Anyone ideas on thisone?

Is at least the issue I mention clear?

chematos · ‎2012-10-31

About count and sum, usually its better to use sum so I have no explanation about your first question.

And about the second one, I have the same problem and anybody answered to me, may be with conditionals you could perform a little the objects...

Gysbert_Wassenaar · ‎2012-10-31

If I had to guess, I would say that count(distinct {<FlagNegative={1}>} %relationKey) needs only one table to calculate the count, whereas sum({<FlagNegative={1}>} RelationCounter) needs both tables. And count can probably use an internal index on %relationKey because of the compression qlikview does.

You could try adding a FlagNegative field to RelationTable and then try sum over that. Then you won't need the set analysis expression anymore so that should speed things up a bit.

talk is cheap, supply exceeds demand

Report Inappropriate Content · ‎2012-11-01

@Gysbert: Thanks for your reply.

I indeed think your explanation could be right, although I have also seen that the sum in 2 tables was faster than the count distinct.

Perhaps in this case the count has a relative small amount of records to count, causing the count(distinct) to be faster.

Unfortunately that it is difficult to predict. I liked the "clear and straight rule of the thumb" to always avoid (and replace) count(distinct).

About your 2nd remark: I don't see how I can add FlagNegative to the RelationTable, because this table contains all relations (unique) and the other table contains payments, where relationKeys exists multiple time (and flagNegative =1 when the payment is <0).

@José Mª Tos

What did you mean by your 2nd remark (add conditionals)?

Did you mean a calculation condition (so the object will only be calculated when certain selections are made)? Or other things?

chematos · ‎2012-11-01

Yes, the object will only be calculated when certain selections are made. I saw that your second table could be large because of the 2.5 minutes to load the data, may be is not interesting to show all the rows in the object.

I was really thinking more in this thread I started:

http://community.qlik.com/message/274216#274216

If you have any ideas, will be apreciated.

Regards

Vegar · ‎2012-11-01

Hi Roberto.

Roberto Postma wrote:
[...]
New expression:
sum({<FlagNegative={1}>} RelationCounter)
[...]
My questions
[1] - explanation on why new expression does NOT seem faster
[2] - hints on how I can optimize the front-end in another way...

[2]: If the possible values for your FlagNegative is 1 and 0 OR 1 and NULL you can skip the SET and instead use

sum(RelationCounter * FlagNegative)

Best regards

Vegar Lie Arntsen

QlikView consultant at egbs consulting ab

Blog (in Swedish): bi-effekten.se

Vegar
Qlik Community MVP

Report Inappropriate Content · ‎2012-11-01

Nice idea, I'll try that and see if it is faster.

Why does Count(distinct) performe worse (here) than sum(counterField)