Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
HonestToad
Partner - Contributor II
Partner - Contributor II

Merging two separate ID systems to identify unique individuals

I am in the process of merging data fields from two seperate tables, both tables are managed independently of each other but do reference the corresponding tables ID number when updated.  For the majority of cases there is a direct 1 to 1 relationship between the different but corresponding ID number in the other table.  However, I am finding anomalies where the ID number in one or the other table has changed over time but the other tables ID number has remained the same and I am looking for a way to ride the logic to identify all the unique individuals. 

Let me give you an example:

 UniquePersonLookup:
LOAD * INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

By looking at and drilling down on the duplicate value between ID_System1 and ID_System2, I would like to be able to report that there are 4 unique individuals within this dataset, ie:

Unique Individual ID AddedUnique Individual ID Added

My initial approach was to use Group By and the CONCAT script function, but once I found instances where the other non-grouped variable could also identify potential family members this approach fell short.  I am now thinking this requires a looped function, but I could do with somebody signposting me on this please...



Labels (4)
1 Solution

Accepted Solutions
Kushal_Chawda

try below

 

Data:
LOAD * ,
    if(Exists(ID_System2),1,0) as ID_System2_Flag
   INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

Left Join(Data)
Load ID_System2,
     rangesum(Peek(Group1),ID_System2_Flag) as Group1
Resident Data
where ID_System2_Flag=1;

Left Join(Data)
Load ID_System1,
     Group1 as Group
Resident Data
where len(trim(Group1))>0;

Drop Fields Group1,ID_System2_Flag;

 

 

Annotation 2020-07-17 142602.png

View solution in original post

8 Replies
Kushal_Chawda

How you are grouping unique individual ID?

HonestToad
Partner - Contributor II
Partner - Contributor II
Author

I am trying to group together all common ID_System1 and ID_System2 numbers that are linked together by having a common partnership on a row.  So, in my first example, ID_System1 number 130169 is linked to two ID_System2 numbers 1536485 and 15731010 so we can be confident that these two ID_System2 records belong to the same person.  Whilst IS_System2 number 1536485 is unique, ID_System2 number 15731010 is also linked to ID_System1 number 468011 in addition to ID_System1 number 130169, so we can now deduce that ID_System1 number 130169 and 468011 are in fact the same person by virtue of sharing one instance where they had a common ID_System2 number.  A further search on the newly identified ID_System1 number 468011 returns an additional ID_System2 number 19291244 which happens to be unique.  From this, we can further deduce that ID_System2 numbers 15364865, 15731010 and 19291244 also relate the same person due to having rows that share the same ID_System1 numbers.  Does that make sense?

Kushal_Chawda

got you.

Kushal_Chawda

do you want grouping number as described here or is it fine to get random grouping number with correct grouping.  eg. for 130169 and 468011 do you want grouping as 1 only or it could be fine if randomly get 4 or 3 as well?

HonestToad
Partner - Contributor II
Partner - Contributor II
Author

The actual number returned is not important as long is its unique to the individual and an individual only receives one number

Kushal_Chawda

try below

 

Data:
LOAD * ,
    if(Exists(ID_System2),1,0) as ID_System2_Flag
   INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

Left Join(Data)
Load ID_System2,
     rangesum(Peek(Group1),ID_System2_Flag) as Group1
Resident Data
where ID_System2_Flag=1;

Left Join(Data)
Load ID_System1,
     Group1 as Group
Resident Data
where len(trim(Group1))>0;

Drop Fields Group1,ID_System2_Flag;

 

 

Annotation 2020-07-17 142602.png

HonestToad
Partner - Contributor II
Partner - Contributor II
Author

Wow, this is amazing and so elegant.  I tried to understand your steps, but you lost me on the rangesum line - how can you use Group1 without it ever being defined?

Kushal_Chawda

Basically I have used Group1 within  peek function to reference the already loaded  previous record during the script run. Within peek you can refer the column which is being created in current load. You can read more about peek function here