Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
betthisisnttake
Contributor III
Contributor III

Extracting unique values for aggregation - patient data

patient_idEpisode Numberdiagnosistotal_cost
12a1xblah11000
12a2yblah21000
13b4zblah13000
14c5xblah12500
14c6zblah22500
14c9xblah32500
15d8yblah11000

Hi there,

I hope I have posted this in the right place. If not let me know.

I have a large file and I have recreated the format in the example above.

Each patient_id is unique in the real world but each patient may have more than one episode number (also unique) and within each episode they may have more than one diagnosis (not unique). The presence of a few diagnosis/episodes for each patient means that the unique patient_ID and the total_costs for each patient appear more than once in the flat file.

I need to create a measure that extracts the total cost for each unique patient_id so that I can create an overall average total cost per patient as a KPI. I then need to use this measure in a chart with diagnosis as the dimension so i can then see the average total cost per patient associated with each diagnosis (or other fields I have in the real table).

So basically i need to sum the total cost for unique episode numbers and then sum these answers for the unique patient id's

When I just use avg(total_cost) - the results are way to high

Apologies in advance if there is something simple I am missing here but I am struggling with this. As a first step I've tried

sum(aggr(max([Episode Number]), total_cost)/count(aggr(max([Episode Number]),distinct [Episode Number])))  in an effort to start by pick out the unique episode ids using max and use aggr to create a temp table and then sum the total costs over this. I am then trying to divide by the number of unique Episode Numbers by counting them. This expression is not throwing an error but it is giving me a blank chart.

On top of this, I just cannot get to the next level and figure out how to work the unique patient_id's into this expression so the average is actually based on these.

Any help much appreciated

kindest regards,

Louise

18 Replies
effinty2112
Master
Master

Hi Louise,

I thought the Episode Numbers were unique. Why the emphasis then on distinct episode numbers ?

If unique then there would be a single cost associated with each one, no?

cheers

Andrew

betthisisnttake
Contributor III
Contributor III
Author

Hi kushal -thanks for your time on this,

In the real file there is potential for more than one diagnosis being associated with each episode for each patient (that is the only relevance of diagnosis in this question). Because there can be more than one diagnosis associated with an episode, it means that the same episode number might appear more than once for a patient_id. And because the same patient might have more than one episode the same patient_id might appear more than once in the file. (even though both these numbers are unique identifiers of patients and their episodes in the real world)

Bearing all that in mind I need the average cost per patient

The sample calculation would be as follows.....

  • total cost for patient 12a is calculated by adding the total costs for each of his episode numbers making sure to only pull out costs for unique episode numbers (1x + 2y) =1000+1000 = €2000, in my real file they may not be unique because of the diagnosis codes.
  • total cost for patient 13b is €3000 (he only has one episode number 4z)
  • total cost for patient 14c is calculated by adding the total costs for each of his episode numbers, making sure to only pull out total costs for unique episode numbers (5x, 6z, 9x) = 2500 +2500+2500 = €7500
  • total cost for patient 15d is €1000 (he only has one episode number 8y)

the average cost per patient is calculated by summing the total cost for each unique patient (12a=€2000, 13b=€3000,14c=€7500,15f=€1000) and dividing by the number of unique patients (4)

(2000+3000+7500+1000)/4 = 3375

As opposed to just averaging all the costs

1000+1000+3000+2500+2500+2500+1000/7 = 1928.57


I need to make sure that the overall average reflects the fact that patient 14c actually has a total cost of €7500 as opposed to being calculated on the assumption that there are 3 patients with lower costs because the expression pulls out the same patient number 3 times for the average. The uniqueness is important. This becomes important as I will also need to adjust the expression from calculating average to calculating median for other purposes.

Apologies for long message and if I am missing something obvious.

Louise

betthisisnttake
Contributor III
Contributor III
Author

Hi Andrew,

I think your answer for average total cost per patient is right but I am puzzled as to how it is being calculated as there is no part of the expression that is summing the total costs for each unique episode_id. The only relevance of diagnosis is that I may choose it as a dimension in a chart to get the values for each - so don't worry about that if constructing the expression.

If helpful, I've attached a copy of my answer to Kushal here where he requested what I would like the result to be and how I would like it calculated. It is a bit long winded but I am trying to ensure all relevant information is present.

all time & thoughts much appreciated.

Louise

Hi kushal -thanks for your time on this,

In the real file there is potential for more than one diagnosis being associated with each episode for each patient (that is the only relevance of diagnosis in this question). Because there can be more than one diagnosis associated with an episode, it means that the same episode number might appear more than once for a patient_id. And because the same patient might have more than one episode the same patient_id might appear more than once in the file. (even though both these numbers are unique identifiers of patients and their episodes in the real world)

Bearing all that in mind about the underlying data table structure, I still need to extract the average cost per patient

The sample calculation would be as follows.....

  • total cost for patient 12a is calculated by adding the total costs for each of his episode numbers making sure to only pull out costs for unique episode numbers (1x + 2y) =1000+1000 = €2000, in my real file they may not be unique because of the diagnosis codes.
  • total cost for patient 13b is €3000 (he only has one episode number 4z)
  • total cost for patient 14c is calculated by adding the total costs for each of his episode numbers, making sure to only pull out total costs for unique episode numbers (5x, 6z, 9x) = 2500 +2500+2500 = €7500
  • total cost for patient 15d is €1000 (he only has one episode number 8y)

the average cost per patient is calculated by summing the total cost for each unique patient (12a=€2000, 13b=€3000,14c=€7500,15f=€1000) and dividing by the number of unique patients (4)

(2000+3000+7500+1000)/4 = 3375

As opposed to just averaging all the costs

1000+1000+3000+2500+2500+2500+1000/7 = 1928.57


I need to make sure that the overall average reflects the fact that patient 14c actually has a total cost of €7500 as opposed to being calculated on the assumption that there are 3 patients with lower costs because the expression pulls out the same patient number 3 times for the average. The uniqueness is important. This becomes important as I will also need to adjust the expression from calculating average to calculating median for other purposes.

Apologies for long message and if I am missing something obvious.

Louise

Kushal_Chawda

you can try something like below

Data:

LOAD * Inline [

patient_id Episode Number diagnosis total_cost

12a 1x blah1 1000

12a 2y blah2 1000

13b 4z blah1 3000

14c 5x blah1 2500

14c 6z blah2 2500

14c 9x blah3 2500

15d 8y blah1 1000  ] (delimiter is '\t');

Left Join(Data)

LOAD patient_id,

    sum(total_cost) as TotalCostPatientWise

Resident Data

Group by patient_id;

Create straight table

Dimension:

diagnosis


Expression:

=sum(DISTINCT TotalCostPatientWise)/Count(DISTINCT total patient_id)


Capture.JPG



Kushal_Chawda

Capture.JPG

effinty2112
Master
Master

Hi Louise,

The expression I used

avg(Aggr(sum(total_cost),patient_id,diagnosis))

is equivalent to

avg(Aggr(sum(total_cost),[Episode Number]))

because, if I understand correctly, each episode is a patient_id, diagnosis pair.

patient_id diagnosis Episode Number sum(total_cost)sum(total_cost)/count([Episode Number])
12ablah11x10001000
blah22y10001000
Total 2000 1000
13bblah14z30003000
Total 3000 3000
14cblah15x25002500
blah26z25002500
blah39x25002500
Total 7500 2500
15dblah18y10001000
Total 1000 1000
Total 13500 1928.5714285714

We can take out dimension Episode Number and get the same result

patient_id diagnosis sum(total_cost)sum(total_cost)/count([Episode Number])
12ablah110001000
blah210001000
Total 2000 1000
13bblah130003000
Total 3000 3000
14cblah125002500
blah225002500
blah325002500
Total 7500 2500
15dblah110001000
Total 1000 1000
Total 13500 1928.5714285714

Or have Episode Number only

Episode Number sum(total_cost)sum(total_cost)/count([Episode Number])
1x10001000
2y10001000
4z30003000
5x25002500
6z25002500
8y10001000
9x25002500
Total 13500 1928.5714285714

Hope this helps.

Andrew

neelamsaroha157
Specialist II
Specialist II

Script:

Tab1:

LOAD * INLINE [

patient_id, Episode Number, diagnosis, total_cost

12a, 1x, blah1, 1000

12a, 2y, blah2, 1000

13b, 4z, blah1, 3000

14c, 5x, blah1, 2500

14c, 6z, blah2, 2500

14c, 9x, blah3, 2500

15d, 8y, blah1, 1000

];

NoConcatenate

LOAD patient_id,

Sum(total_cost) as NewCost

Resident Tab1

Group by patient_id

;

Expression:

Avg(NewCost)

Capture.PNG

neelamsaroha157
Specialist II
Specialist II

or if you want original values at Episode level then you can change expression as

If(RowNo() =0, Avg(NewCost), Sum(total_cost))

Capture.PNG

betthisisnttake
Contributor III
Contributor III
Author

Just to update - none of these ideas got me exactly what I needed, however they have provided me with couple of jump off points which have allowed me to combine a few ideas and get further than I was before. It looks like it can all be done via expressions and not involve load scripts and I am confident I can get it to work. For the moment I have to go do a Qlik Sense June 2017 server upgrade so I can't work on it now for a while. I will revert with the working expressions and close the question when I get back to it again. Many thanks for the insights and time, much appreciated.