Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How To Sum Distinct Dimension?

Hello,

I have a simple dataset below.  How do I sum only the distinct CustomerID?

CustomerID  CLICKS  POS

1901            8            US

1901            8            US

1901            8            JP

2901            5            US

2901            5            US

Current Sum 8+8+8+5+5 = 34

Desire Sum 8+5 = 13

Do I use a aggr function to group by CustomerID, and take the avg of the clicks?  (8+8+8)/3=8

What would the expression look like?

Your help is much appreciated!

15 Replies
Not applicable
Author

1. What is the expected result if there is one more entry? e.g. 1901 9 US.

2. Check out the distinct keyword.

SunilChauhan
Champion II
Champion II

customer ID  in dimension and

avg(CLICKS )

or

aggr(avg(CLICKS ),CutomerId)

Sunil Chauhan
Not applicable
Author

Pls try this ...

=sum(DISTINCT Clikcs)

Regards,

Antony.

swuehl
MVP
MVP

I would suggest

=sum(aggr( sum(Distinct CLICKS),CustomerID))

This will show the same results as sum(aggr( avg(CLICKS),CustomerID)) or sum(aggr( max(CLICKS),CustomerID)) etc. with above sample data, but different if CustomerID might have multiple distinct CLICKS, like vishal_pai mentioned.

I believe sum(distinct CLICKS) will only work for distinct CLICKS across CustomerID, so if both customers would have 5 CLICKS on each record, I think the sum should be 10, not 5.

Hope this helps,

Stefan

Not applicable
Author

That's a very good point if a CustomerID has two or more distinct CLICKS (8 & 9).

I'm ok with simply taking the maximum (9), but definitely NOT (cannot be 8+9).  Basically a group by on CustomerID, and return one value only and then sum them up.

=sum(aggr( max(CLICKS),CustomerID))

I would assume, above is preferred expression as supposed to

=sum(aggr( sum(Distinct CLICKS),CustomerID))

Correct me if I am wrong.  Appreciate your help!

Not applicable
Author

1. Is your data set result of a join that cannot be avoided? E.g. Same customer id having same number of clicks across different POS does not look right.

2. Can you please define what you are trying to achieve? E.g. why are you ok with max clicks.

The reason I am asking these questions is following:

Although an expression might solve your short term problem for now it will not help when you are solving the bigger picture. E.g. Qlikview is powerful because selecting data in a chart / table et al updates the other views. And that is necessary to discover useful data. If your data model is not right and consistent, you will face a lot of problems going forward.

I will reply to the expressions part tomorrow.

Best Regards,
Vishal

Not applicable
Author

I provided the sample dataset above, but actually, my final report is group by POS.

Dataset:

CustomerID  CLICKS  POS

1901            8            US

1901            8            US

1901            8            JP

2901            5            US

2901            5            US

Desired output looks something as such:

POS          SUM           AVG

US           8+5              (8+5)/2 = 6.5  (2 is distinct CustomerID for this POS)

JP            8                  8/1 = 8

I tested the expression below, but is not returning the correct sum.  Do I need to aggr by POS as well?

=sum(aggr( max(CLICKS),CustomerID))

Vishal, yes, you brought up a very good point, because I am joining on different granularity.  The dataset is at placement level, but the CLICKS data is only at CustomerID level, so that's why there is repeated CLICKS in my dataset.  Would you guys recommend breaking out the CLICKS as a separate dataset, and let QV join simply on CustomerID & POS?

Your help is appreciated!

Not applicable
Author

I've attached the QVW.  I don't think aggr is summing correctly for A2 & A3 Bucket, also how to calculate avg?

FYI, I added Bucket to my dataset. 

Dataset:

CustomerID  CLICKS     POS   Bucket

1901            8            US     A1

1901            8            US     A1

1901            8            US     A2

1901            8            JP      A2

2901            5            US     A2

2901            5            US     A3

Desired output looks something as such:

Bucket    SUM                AVG 

A1           8                      8 / 1 = 8

A2           8+5 = 13          (8+5)/2 = 6.5  (2 is distinct CustomerID for this Bucket)

A3           5                      5 / 1 = 5

Not applicable
Author

1. Dimension Bucket and expression Avg(DISTINCT CLICKS) (Edit: the other expression is Sum(DISTINCT CLICKS)). But I am really not sure if it will solve all your problems going forward wrt consistency of the solution.

2. Regarding data re-modelling...I am not yet there to comment on it . From my reading on the community and blogs there is no one right solution. It depends on your situation. Check this one out. http://qlikviewnotes.blogspot.com/2009/11/understanding-join-and-concatenate.html

3. Less is always better. Every time I need to add a column to the dataset I worry because it might result in data set being inconsistent in different situations.

Best Regards,

Vishal