Create a Set of Duplicates ONLY

joey_lutes · ‎2016-04-27

QLIK SENSE DESKTOP:

I have a dataset where I have a need to display various views of rows which are DUPLICATES of other rows, needing to work with ONLY the duplicates.

For example:

(Hypothetical). Patients visit an office. Each one has an ailment. Their visit and symptoms are logged, and if they are confirmed 'cured' that date is logged. Frequently, the same patients come back.

a) the reference_id is a concatenation of name&symptom.

b) if a patient returns with the same symptoms, the visit_count is incremented by 1, no cure_date is entered, no new id is created

c) if a patient returns with new symptoms, a new record with new id and new initial_visit_date is created

Sometimes, after a patient is cured, they will come back after being confirmed cured, with the same symptoms, creating a duplicated reference_id. This could happen many times potentially - some with cure-dates, one without.

I need a way to identify:

How many times has a recurrence occurred: count(reference_id) - count(distinct reference_id) works. - gives me the number of unique recurrences.
I also need a way to select that set of data and count the number of times within those recurrences patients have been cured vs have not been confirmed cured. Basically, let's say I have 3 patients . . data below.

id name symptom reference_id visit_count initial_visit_date cure_date

1 John earache john_earache 3 3/1 -

2 Mary nosebleed mary_nosebled 2 3/2 3/4

3 Mary nosebleed mary_nosebleed 2 3/6 3/8

4 Mary headache mary_headache 1 3/7 3/8

5 Mary nosebleed mary_nosebleed 5 3/9 -

6 Jim arm jim_arm 1 3/10 3/11

7 Jim arm jim_arm 3 3/15 -

John is easy. He came in with an earache 3 times, has not yet been confirmed cured. Open case.

Mary however came in with a nosebleed, and it was confirmed cured. then she came back - again confirmed cured. Then again, now not confirmed cured. However, her headache is gone.

Jim the same with his arm. Cured the first time, came back, now not cured yet.

I'm looking for a way to select the set data for ONLY Mary(nosebleed) and Jim - and each row.

I want to see:

Mary came in 3 times for the same thing. She was confirmed cured twice, has one open case - therefore indicating a potential chronic issue.

Same with Jim. I want to exclude John from this set of data as well as Mary with her headache.

Reference_id Total Visits Total Cures Open Case Age

sum(visit_count) sum(visit_count) today() - cure_date

Total Unique Recurrences: 2

Total Recurrence Instances: 5

Total Cured Recurrences: 3 (cure_date exists)

Total Open Recurrences: 2 (cure_date blank)

I'm looking at a set analysis where if I could remove the non-duplicate reference_ids, it would leave the duplicate id rows after which I could apply count(${<cure_date-={'*']>reference_id) or the inverse - just having trouble getting that set of data to filter in the first place.

I have become decent with set analysis but this is escaping me. I'm not sure if a load script is in order, or if this should be within the charts themselves. I think if someone could show me how to procure JUST the set of data that contains ONLY the duplicate rows, I could manage the rest . . .maybe?

Help?

Thanks!!

jok · ‎2016-05-10

There are several posts related to finding duplicates. Below is a link to one of those. Hope this helps.

mark duplicates? | Qlik Community

function