Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
betthisisnttake
Contributor III
Contributor III

Extracting unique values for aggregation - patient data

patient_idEpisode Numberdiagnosistotal_cost
12a1xblah11000
12a2yblah21000
13b4zblah13000
14c5xblah12500
14c6zblah22500
14c9xblah32500
15d8yblah11000

Hi there,

I hope I have posted this in the right place. If not let me know.

I have a large file and I have recreated the format in the example above.

Each patient_id is unique in the real world but each patient may have more than one episode number (also unique) and within each episode they may have more than one diagnosis (not unique). The presence of a few diagnosis/episodes for each patient means that the unique patient_ID and the total_costs for each patient appear more than once in the flat file.

I need to create a measure that extracts the total cost for each unique patient_id so that I can create an overall average total cost per patient as a KPI. I then need to use this measure in a chart with diagnosis as the dimension so i can then see the average total cost per patient associated with each diagnosis (or other fields I have in the real table).

So basically i need to sum the total cost for unique episode numbers and then sum these answers for the unique patient id's

When I just use avg(total_cost) - the results are way to high

Apologies in advance if there is something simple I am missing here but I am struggling with this. As a first step I've tried

sum(aggr(max([Episode Number]), total_cost)/count(aggr(max([Episode Number]),distinct [Episode Number])))  in an effort to start by pick out the unique episode ids using max and use aggr to create a temp table and then sum the total costs over this. I am then trying to divide by the number of unique Episode Numbers by counting them. This expression is not throwing an error but it is giving me a blank chart.

On top of this, I just cannot get to the next level and figure out how to work the unique patient_id's into this expression so the average is actually based on these.

Any help much appreciated

kindest regards,

Louise

18 Replies
neelamsaroha157
Specialist II
Specialist II

Let me know if this is what you are looking for.

I have tried to get the avg by patient & Episode number with multiple methods.

Let me know if you are looking for different output.

betthisisnttake
Contributor III
Contributor III
Author

Hi,

@I really appreciate the time you have taken to help with this. Unfortunately, I can't open your qlikview .qvw file as I am working in either QlikSense server or Qliksense desktop. I can see only the load script and the data model when i try to open it in qliksense. I can only open .qvf files.

If it is not too much trouble could you post the actual expressions you have compiled and I will try and replicate them?

kind regards,

Louise

marcus_malinow
Partner - Specialist III
Partner - Specialist III

how about

avg(

    aggr(

        sum(total_cost),

        patient_id)

    )

?

effinty2112
Master
Master

Hi Louise,

Maybe

Average total cost per patient as a KPI

avg(Aggr(sum(total_cost),patient_id)) = 3375

Average total cost per patient associated with each diagnosis

diagnosis avg(Aggr(sum(total_cost),patient_id,diagnosis))
blah11875
blah21750
blah32500


Andrew

betthisisnttake
Contributor III
Contributor III
Author

Thanks Marcus - I've had a look at that - I need to take the average total costs of the distinct episode numbers for each patient before I then take the average of those figures calculated on each distinct patient id. Which i think is a nested aggregation and not allowed?

betthisisnttake
Contributor III
Contributor III
Author

Hi Andrew,

Thank you for that, again much appreciated, but I have the same problem there, yes it looks like that is giving me an average of the total costs for each patient but before that calculation, I need to ensure that each patients total cost value is comprised of the total of his episode number costs.

In the real world - a patient might come to the hospital once - he has one episode that costs €1000 - that is the figure used in the sum part of the calculation of the average cost for a patient - simple enough.

However in some cases the patient will present at the hospital 2 or 3 times each time has a unique episode number and is associated with its own cost. These costs have to be summed for each patient to get their real total cost figure and it is this figure that is included in the 'average per patient' calculation. This is the bit I am missing prior to the average for each patient.

I am not too worried about the diagnosis as I can just pick that as a dimension on a chart once I have the expression for doing the above calculation sorted.

Apologies for melting peoples heads.

Louise

betthisisnttake
Contributor III
Contributor III
Author

To clarify - in the above example patient 14 c has 3 episode numbers (5x, 6z and 9x) each with an associated cost of (2500, 2500 and 2500 respectively - terrible example - should have made those numbers different), the costs for each of these has to be summed to get the total cost for patient 14c (which is 2500 +2500+2500 = 7500), that is the total cost used for patient 14 c when the average total cost per patient is being calculated.

effinty2112
Master
Master

Hi Louise,

Apologies if I'm not following you but do you want the average cost of diagnosis per patient?

=avg(aggr(Sum(total_cost),diagnosis,patient_id))

1928.5714

diagnosis avg(Aggr(sum(total_cost),patient_id,diagnosis))
1928.5714
blah11875
blah21750
blah32500

Switching the dimension give the same overall average

patient_id avg(Aggr(sum(total_cost),patient_id,diagnosis))
1928.5714
12a1000
15d1000
14c2500
13b3000

Cheers

Andrew

Kushal_Chawda

If you can provide the expected output for your above data, that will be helpful to give you exactly what you want