Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
patient_id | Episode Number | diagnosis | total_cost |
12a | 1x | blah1 | 1000 |
12a | 2y | blah2 | 1000 |
13b | 4z | blah1 | 3000 |
14c | 5x | blah1 | 2500 |
14c | 6z | blah2 | 2500 |
14c | 9x | blah3 | 2500 |
15d | 8y | blah1 | 1000 |
Hi there,
I hope I have posted this in the right place. If not let me know.
I have a large file and I have recreated the format in the example above.
Each patient_id is unique in the real world but each patient may have more than one episode number (also unique) and within each episode they may have more than one diagnosis (not unique). The presence of a few diagnosis/episodes for each patient means that the unique patient_ID and the total_costs for each patient appear more than once in the flat file.
I need to create a measure that extracts the total cost for each unique patient_id so that I can create an overall average total cost per patient as a KPI. I then need to use this measure in a chart with diagnosis as the dimension so i can then see the average total cost per patient associated with each diagnosis (or other fields I have in the real table).
So basically i need to sum the total cost for unique episode numbers and then sum these answers for the unique patient id's
When I just use avg(total_cost) - the results are way to high
Apologies in advance if there is something simple I am missing here but I am struggling with this. As a first step I've tried
sum(aggr(max([Episode Number]), total_cost)/count(aggr(max([Episode Number]),distinct [Episode Number]))) in an effort to start by pick out the unique episode ids using max and use aggr to create a temp table and then sum the total costs over this. I am then trying to divide by the number of unique Episode Numbers by counting them. This expression is not throwing an error but it is giving me a blank chart.
On top of this, I just cannot get to the next level and figure out how to work the unique patient_id's into this expression so the average is actually based on these.
Any help much appreciated
kindest regards,
Louise
Hi Louise,
I thought the Episode Numbers were unique. Why the emphasis then on distinct episode numbers ?
If unique then there would be a single cost associated with each one, no?
cheers
Andrew
Hi kushal -thanks for your time on this,
In the real file there is potential for more than one diagnosis being associated with each episode for each patient (that is the only relevance of diagnosis in this question). Because there can be more than one diagnosis associated with an episode, it means that the same episode number might appear more than once for a patient_id. And because the same patient might have more than one episode the same patient_id might appear more than once in the file. (even though both these numbers are unique identifiers of patients and their episodes in the real world)
Bearing all that in mind I need the average cost per patient
The sample calculation would be as follows.....
the average cost per patient is calculated by summing the total cost for each unique patient (12a=€2000, 13b=€3000,14c=€7500,15f=€1000) and dividing by the number of unique patients (4)
(2000+3000+7500+1000)/4 = 3375
As opposed to just averaging all the costs
1000+1000+3000+2500+2500+2500+1000/7 = 1928.57
I need to make sure that the overall average reflects the fact that patient 14c actually has a total cost of €7500 as opposed to being calculated on the assumption that there are 3 patients with lower costs because the expression pulls out the same patient number 3 times for the average. The uniqueness is important. This becomes important as I will also need to adjust the expression from calculating average to calculating median for other purposes.
Apologies for long message and if I am missing something obvious.
Louise
Hi Andrew,
I think your answer for average total cost per patient is right but I am puzzled as to how it is being calculated as there is no part of the expression that is summing the total costs for each unique episode_id. The only relevance of diagnosis is that I may choose it as a dimension in a chart to get the values for each - so don't worry about that if constructing the expression.
If helpful, I've attached a copy of my answer to Kushal here where he requested what I would like the result to be and how I would like it calculated. It is a bit long winded but I am trying to ensure all relevant information is present.
all time & thoughts much appreciated.
Louise
Hi kushal -thanks for your time on this,
In the real file there is potential for more than one diagnosis being associated with each episode for each patient (that is the only relevance of diagnosis in this question). Because there can be more than one diagnosis associated with an episode, it means that the same episode number might appear more than once for a patient_id. And because the same patient might have more than one episode the same patient_id might appear more than once in the file. (even though both these numbers are unique identifiers of patients and their episodes in the real world)
Bearing all that in mind about the underlying data table structure, I still need to extract the average cost per patient
The sample calculation would be as follows.....
the average cost per patient is calculated by summing the total cost for each unique patient (12a=€2000, 13b=€3000,14c=€7500,15f=€1000) and dividing by the number of unique patients (4)
(2000+3000+7500+1000)/4 = 3375
As opposed to just averaging all the costs
1000+1000+3000+2500+2500+2500+1000/7 = 1928.57
I need to make sure that the overall average reflects the fact that patient 14c actually has a total cost of €7500 as opposed to being calculated on the assumption that there are 3 patients with lower costs because the expression pulls out the same patient number 3 times for the average. The uniqueness is important. This becomes important as I will also need to adjust the expression from calculating average to calculating median for other purposes.
Apologies for long message and if I am missing something obvious.
Louise
you can try something like below
Data:
LOAD * Inline [
patient_id Episode Number diagnosis total_cost
12a 1x blah1 1000
12a 2y blah2 1000
13b 4z blah1 3000
14c 5x blah1 2500
14c 6z blah2 2500
14c 9x blah3 2500
15d 8y blah1 1000 ] (delimiter is '\t');
Left Join(Data)
LOAD patient_id,
sum(total_cost) as TotalCostPatientWise
Resident Data
Group by patient_id;
Create straight table
Dimension:
diagnosis
Expression:
=sum(DISTINCT TotalCostPatientWise)/Count(DISTINCT total patient_id)
Hi Louise,
The expression I used
avg(Aggr(sum(total_cost),patient_id,diagnosis))
is equivalent to
avg(Aggr(sum(total_cost),[Episode Number]))
because, if I understand correctly, each episode is a patient_id, diagnosis pair.
patient_id | diagnosis | Episode Number | sum(total_cost) | sum(total_cost)/count([Episode Number]) |
---|---|---|---|---|
12a | blah1 | 1x | 1000 | 1000 |
blah2 | 2y | 1000 | 1000 | |
Total | 2000 | 1000 | ||
13b | blah1 | 4z | 3000 | 3000 |
Total | 3000 | 3000 | ||
14c | blah1 | 5x | 2500 | 2500 |
blah2 | 6z | 2500 | 2500 | |
blah3 | 9x | 2500 | 2500 | |
Total | 7500 | 2500 | ||
15d | blah1 | 8y | 1000 | 1000 |
Total | 1000 | 1000 | ||
Total | 13500 | 1928.5714285714 |
We can take out dimension Episode Number and get the same result
patient_id | diagnosis | sum(total_cost) | sum(total_cost)/count([Episode Number]) |
---|---|---|---|
12a | blah1 | 1000 | 1000 |
blah2 | 1000 | 1000 | |
Total | 2000 | 1000 | |
13b | blah1 | 3000 | 3000 |
Total | 3000 | 3000 | |
14c | blah1 | 2500 | 2500 |
blah2 | 2500 | 2500 | |
blah3 | 2500 | 2500 | |
Total | 7500 | 2500 | |
15d | blah1 | 1000 | 1000 |
Total | 1000 | 1000 | |
Total | 13500 | 1928.5714285714 |
Or have Episode Number only
Episode Number | sum(total_cost) | sum(total_cost)/count([Episode Number]) |
---|---|---|
1x | 1000 | 1000 |
2y | 1000 | 1000 |
4z | 3000 | 3000 |
5x | 2500 | 2500 |
6z | 2500 | 2500 |
8y | 1000 | 1000 |
9x | 2500 | 2500 |
Total | 13500 | 1928.5714285714 |
Hope this helps.
Andrew
Script:
Tab1:
LOAD * INLINE [
patient_id, Episode Number, diagnosis, total_cost
12a, 1x, blah1, 1000
12a, 2y, blah2, 1000
13b, 4z, blah1, 3000
14c, 5x, blah1, 2500
14c, 6z, blah2, 2500
14c, 9x, blah3, 2500
15d, 8y, blah1, 1000
];
NoConcatenate
LOAD patient_id,
Sum(total_cost) as NewCost
Resident Tab1
Group by patient_id
;
Expression:
Avg(NewCost)
or if you want original values at Episode level then you can change expression as
If(RowNo() =0, Avg(NewCost), Sum(total_cost))
Just to update - none of these ideas got me exactly what I needed, however they have provided me with couple of jump off points which have allowed me to combine a few ideas and get further than I was before. It looks like it can all be done via expressions and not involve load scripts and I am confident I can get it to work. For the moment I have to go do a Qlik Sense June 2017 server upgrade so I can't work on it now for a while. I will revert with the working expressions and close the question when I get back to it again. Many thanks for the insights and time, much appreciated.