1 Reply Latest reply: May 10, 2016 3:24 PM by John Bosko RSS

    Create a Set of Duplicates ONLY

    Joey Lutes

      QLIK SENSE DESKTOP:

      I have a dataset where I have a need to display various views of rows which are DUPLICATES of other rows, needing to work with ONLY the duplicates.

       

      For example:

      (Hypothetical).  Patients visit an office.  Each one has an ailment.  Their visit and symptoms are logged, and if they are confirmed 'cured' that date is logged.  Frequently, the same patients come back.

      a)  the reference_id is a concatenation of name&symptom.

      b)  if a patient returns with the same symptoms, the visit_count is incremented by 1, no cure_date is entered, no new id is created

      c)  if a patient returns with new symptoms, a new record with new id and new initial_visit_date is created

       

      Sometimes, after a patient is cured, they will come back after being confirmed cured, with the same symptoms, creating a duplicated reference_id.  This could happen many times potentially - some with cure-dates, one without.

       

      I need a way to identify:

      • How many times has a recurrence occurred: count(reference_id) - count(distinct reference_id) works. - gives me the number of unique recurrences.
      • I also need a way to select that set of data and count the number of times within those recurrences patients have been cured vs have not been confirmed cured.   Basically, let's say I have 3 patients . . data below.

       

       

      id          name          symptom    reference_id   visit_count      initial_visit_date     cure_date

      1          John          earache          john_earache        3                     3/1                     -

      2          Mary          nosebleed     mary_nosebled       2                     3/2                    3/4

      3          Mary          nosebleed     mary_nosebleed     2                     3/6                    3/8

      4          Mary          headache     mary_headache      1                      3/7                    3/8

      5          Mary          nosebleed     mary_nosebleed     5                     3/9                     -

      6          Jim               arm              jim_arm               1                    3/10                  3/11

      7          Jim               arm              jim_arm               3                    3/15                   -

       

      John is easy.  He came in with an earache 3 times, has not yet been confirmed cured.  Open case.

      Mary however came in with a nosebleed, and it was confirmed cured.  then she came back - again confirmed cured.  Then again, now not confirmed cured.  However, her headache is gone.

      Jim the same with his arm.  Cured the first time, came back, now not cured yet.

       

      I'm looking for a way to select the set data for ONLY Mary(nosebleed) and Jim - and each row.

       

      I want to see:

      Mary came in 3 times for the same thing.  She was confirmed cured twice, has one open case - therefore indicating a potential chronic issue.

      Same with Jim.  I want to exclude John from this set of data as well as Mary with her headache.

       

      Reference_id                   Total Visits               Total Cures               Open Case Age

                                       sum(visit_count)          sum(visit_count)         today() - cure_date

       

      Total Unique Recurrences:  2

      Total Recurrence Instances: 5

      Total Cured Recurrences: 3 (cure_date exists)

      Total Open Recurrences:  2 (cure_date blank)

       

      I'm looking at a set analysis where if I could remove the non-duplicate reference_ids, it would leave the duplicate id rows after which I could apply count(${<cure_date-={'*']>reference_id) or the inverse - just having trouble getting that set of data to filter in the first place.

       

      I have become decent with set analysis but this is escaping me.  I'm not sure if a load script is in order, or if this should be within the charts themselves.  I think if someone could show me how to procure JUST the set of data that contains ONLY the duplicate rows, I could manage the rest . . .maybe?

       

      Help?

       

      Thanks!!