Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Say I have a list of clients, and I want to create a count of unique client attendances over 2 years by age group.
Using a dimension of AGE_GROUP (grouping in 10 year blocks) and an expression of
COUNT(DISTINCT CLIENT_ID) (where the CLIENT_ID is unique to a client), I end up with something like
AGE_BAND | # Clients |
40 - 49 | 2718 |
50 - 59 | 3390 |
60 - 69 | 2216 |
70 - 74 | 439 |
75 - 79 | 129 |
80+ | 31 |
40+ | 8875 |
Now, from this you can see that the sum of the age groups (excluding 40+, it being the total) comes to 8923.
The number of distinct clients is in fact 8875, but there is an issue in that a client who attended when they were in the 40-49 range who then, within the two-year period attended when they were in the 50-59 period would be counted twice.
How do I calculate distinct clients by age band factoring for where that client attended while in an earlier age band?
Cheers,
Lee
Okay, thought I'd post an update on this - I did have to create a new structure around this.
For anyone who has to deal with a similar issue, my methodology was this
1: Get the minimum and maximum years available
LET vMinYr = YEAR($(vMinDate));
LET vMaxYr = YEAR($(vMaxDate));
2: Do a loop from through the years
FOR i =Num#($(vMinYr), '####') to Num($(vMaxYr))
LET j =Num#($(i), '####')+2; //To ensure 2 year blocks
3: Create a temporary table of sorted values
CLI_TMP:
LOAD
[Fields],
Text($(i)) & '-' & Text(Num#($(i), '####')+1) AS ATTENDANCE_RANGE, --i.e., 2013 - 2014
RESIDENT CLIENT_ATTENDANCES
WHERE Num#(YEAR(ATTENDANCE_DATE))>=$(i) AND Num#(YEAR(ATTENDANCE_DATE)) < $(j)
ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC
;
4: Screen the temp table for duplicates into the final table - I needed to do this because using peek/previous in the above build didn't seem to filter duplicates properly
LOAD
[Fields]
IF(CLI_ID <> Previous(CLI_ID),1,0) AS ATTENDANCE_FLAG --
RESIDENT CLI_TMP
ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC -- to be sure to be sure...
;
5: Drop the tmp table
Drop table CLI_TMP
Next i;
It's a bit convoluted but it works. It would be nice if there was a more elegant solutions though.
I would suggest creating a flag in the script for clients 1st visit. So Something like this:
LOAD CLIENT_ID,
If(CLIENT_ID= Previous(CLIENT_ID), 0, 1) as Flag
Resident ....
Order By Client, Age;
Now use this as your expression
Count(DISTINCT {<Flag = {1}>} CLIENT_ID)
Thanks Sunny - In this case, this is not going to work (I don't think - not without some refinement). The analysis is performed over 2-year blocks, so the end table would look like:
2009–2010 | 2010–2011 | 2011–2012 | 2012–2013 | 2013–2014 | |
40–49 years | xx | xx | xx | xx | xx |
50–59 years | xx | xx | xx | xx | xx |
60–69 years | xx | xx | xx | xx | xx |
70–79 yrs | xx | xx | xx | xx | xx |
80+ years | xx | xx | xx | xx | xx |
Total 40+ Years | Sum of above | Sum of above | Sum of above | Sum of above | Sum of above |
Now the issue lies in that a client who attends in 2009 for the first time, and then in 2010, and then in 2011 needs to be only counted once in each block, irrespective of the age at attendance. For example, the client is 49 at their 2009 attendance and 50 at their 2010 attendance, then the count should only have them at the 40-49 band for the 2009-2010 column, but then they should also be counted once again in the 50-59 years band of the 2010-2011 column, and so forth.
I suspect I will have to build a new table to factor for this, but was hoping that set analysis could handle it.
Any thoughts or suggestions on an approach appreciated,
Cheers,
Lee
I think this could still work but will have to be played around with. If you can provide few sample rows of data with the expected output, someone here might be able to help you achieve your desired output.
Best,
Sunny
Okay, thought I'd post an update on this - I did have to create a new structure around this.
For anyone who has to deal with a similar issue, my methodology was this
1: Get the minimum and maximum years available
LET vMinYr = YEAR($(vMinDate));
LET vMaxYr = YEAR($(vMaxDate));
2: Do a loop from through the years
FOR i =Num#($(vMinYr), '####') to Num($(vMaxYr))
LET j =Num#($(i), '####')+2; //To ensure 2 year blocks
3: Create a temporary table of sorted values
CLI_TMP:
LOAD
[Fields],
Text($(i)) & '-' & Text(Num#($(i), '####')+1) AS ATTENDANCE_RANGE, --i.e., 2013 - 2014
RESIDENT CLIENT_ATTENDANCES
WHERE Num#(YEAR(ATTENDANCE_DATE))>=$(i) AND Num#(YEAR(ATTENDANCE_DATE)) < $(j)
ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC
;
4: Screen the temp table for duplicates into the final table - I needed to do this because using peek/previous in the above build didn't seem to filter duplicates properly
LOAD
[Fields]
IF(CLI_ID <> Previous(CLI_ID),1,0) AS ATTENDANCE_FLAG --
RESIDENT CLI_TMP
ORDER BY CLI_ID ASC, ATTENDANCE_DATE ASC -- to be sure to be sure...
;
5: Drop the tmp table
Drop table CLI_TMP
Next i;
It's a bit convoluted but it works. It would be nice if there was a more elegant solutions though.