Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Counting Distinct across dimensions

Hi,

Say I have a list of clients, and I want to create a count of unique client attendances over 2 years by age group.

Using a dimension of AGE_GROUP (grouping in 10 year blocks) and an expression of

COUNT(DISTINCT CLIENT_ID) (where the CLIENT_ID is unique to a client), I end up with something like

 

AGE_BAND# Clients
40 - 492718
50 - 593390
60 - 692216
70 - 74439
75 - 79129
80+31
40+8875

Now, from this you can see that the sum of the age groups (excluding 40+, it being the total) comes to 8923.

The number of distinct clients is in fact 8875, but there is an issue in that a client who attended when they were in the 40-49 range who then, within the two-year period attended when they were in the 50-59 period would be counted twice.

How do I calculate distinct clients by age band factoring for where that client attended while in an earlier age band?

Cheers,

Lee

1 Solution

Accepted Solutions
Not applicable
Author

Okay, thought I'd post an update on this - I did have to create a new structure around this.

For anyone who has to deal with a similar issue, my methodology was this

1: Get the minimum and maximum years available

LET vMinYr = YEAR($(vMinDate));

LET vMaxYr = YEAR($(vMaxDate));

2: Do a loop from through the years

FOR i =Num#($(vMinYr), '####') to Num($(vMaxYr))

LET j =Num#($(i), '####')+2; //To ensure 2 year blocks

3: Create a temporary table of sorted values

CLI_TMP:

LOAD

    [Fields],

    Text($(i)) & '-' & Text(Num#($(i), '####')+1) AS ATTENDANCE_RANGE, --i.e., 2013 - 2014

RESIDENT CLIENT_ATTENDANCES

WHERE Num#(YEAR(ATTENDANCE_DATE))>=$(i) AND Num#(YEAR(ATTENDANCE_DATE)) < $(j)

ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC

;

4: Screen the temp table for duplicates into the final table - I needed to do this because using peek/previous in the above build didn't seem to filter duplicates properly


LOAD

     [Fields]

    IF(CLI_ID <> Previous(CLI_ID),1,0) AS ATTENDANCE_FLAG --

RESIDENT CLI_TMP

ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC -- to be sure to be sure...

;

5: Drop the tmp table

Drop table CLI_TMP



Next i;

It's a bit convoluted but it works. It would be nice if there was a more elegant solutions though.



View solution in original post

4 Replies
sunny_talwar

I would suggest creating a flag in the script for clients 1st visit. So Something like this:

LOAD CLIENT_ID,

          If(CLIENT_ID= Previous(CLIENT_ID), 0, 1) as Flag

Resident ....

Order By Client, Age;

Now use this as your expression

Count(DISTINCT {<Flag  = {1}>} CLIENT_ID)

Not applicable
Author

Thanks Sunny - In this case, this is not going to work (I don't think - not without some refinement). The analysis is performed over 2-year blocks, so the end table would look like:

  

2009–20102010–20112011–20122012–20132013–2014
40–49 years xxxxxxxxxx
50–59 years xxxxxxxxxx
60–69 yearsxxxxxxxxxx
70–79 yrsxxxxxxxxxx
80+ yearsxxxxxxxxxx
Total 40+ YearsSum of aboveSum of aboveSum of aboveSum of above

Sum of above

Now the issue lies in that a client who attends in 2009 for the first time, and then in 2010, and then in 2011 needs to be only counted once in each block, irrespective of the age at attendance. For example, the client is 49 at their 2009 attendance and 50 at their 2010 attendance, then the count should only have them at the 40-49 band for the 2009-2010 column, but then they should also be counted once again in the 50-59 years band of the 2010-2011 column, and so forth.

I suspect I will have to build a new table to factor for this, but was hoping that set analysis could handle it.

Any thoughts or suggestions on an approach appreciated,

Cheers,


Lee

sunny_talwar

I think this could still work but will have to be played around with. If you can provide few sample rows of data with the expected output, someone here might be able to help you achieve your desired output.

Best,

Sunny

Not applicable
Author

Okay, thought I'd post an update on this - I did have to create a new structure around this.

For anyone who has to deal with a similar issue, my methodology was this

1: Get the minimum and maximum years available

LET vMinYr = YEAR($(vMinDate));

LET vMaxYr = YEAR($(vMaxDate));

2: Do a loop from through the years

FOR i =Num#($(vMinYr), '####') to Num($(vMaxYr))

LET j =Num#($(i), '####')+2; //To ensure 2 year blocks

3: Create a temporary table of sorted values

CLI_TMP:

LOAD

    [Fields],

    Text($(i)) & '-' & Text(Num#($(i), '####')+1) AS ATTENDANCE_RANGE, --i.e., 2013 - 2014

RESIDENT CLIENT_ATTENDANCES

WHERE Num#(YEAR(ATTENDANCE_DATE))>=$(i) AND Num#(YEAR(ATTENDANCE_DATE)) < $(j)

ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC

;

4: Screen the temp table for duplicates into the final table - I needed to do this because using peek/previous in the above build didn't seem to filter duplicates properly


LOAD

     [Fields]

    IF(CLI_ID <> Previous(CLI_ID),1,0) AS ATTENDANCE_FLAG --

RESIDENT CLI_TMP

ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC -- to be sure to be sure...

;

5: Drop the tmp table

Drop table CLI_TMP



Next i;

It's a bit convoluted but it works. It would be nice if there was a more elegant solutions though.