Solved: Re: Merging two separate ID systems to identify un... - Qlik Community

HonestToad · ‎2020-07-17

I am in the process of merging data fields from two seperate tables, both tables are managed independently of each other but do reference the corresponding tables ID number when updated. For the majority of cases there is a direct 1 to 1 relationship between the different but corresponding ID number in the other table. However, I am finding anomalies where the ID number in one or the other table has changed over time but the other tables ID number has remained the same and I am looking for a way to ride the logic to identify all the unique individuals.

Let me give you an example:

UniquePersonLookup:
LOAD * INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

By looking at and drilling down on the duplicate value between ID_System1 and ID_System2, I would like to be able to report that there are 4 unique individuals within this dataset, ie:

Unique Individual ID Added

My initial approach was to use Group By and the CONCAT script function, but once I found instances where the other non-grouped variable could also identify potential family members this approach fell short. I am now thinking this requires a looped function, but I could do with somebody signposting me on this please...

Kushal_Chawda · ‎2020-07-17

try below

Data:
LOAD * ,
    if(Exists(ID_System2),1,0) as ID_System2_Flag
   INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

Left Join(Data)
Load ID_System2,
     rangesum(Peek(Group1),ID_System2_Flag) as Group1
Resident Data
where ID_System2_Flag=1;

Left Join(Data)
Load ID_System1,
     Group1 as Group
Resident Data
where len(trim(Group1))>0;

Drop Fields Group1,ID_System2_Flag;

View solution in original post

Kushal_Chawda · ‎2020-07-17

How you are grouping unique individual ID?

HonestToad · ‎2020-07-17

I am trying to group together all common ID_System1 and ID_System2 numbers that are linked together by having a common partnership on a row. So, in my first example, ID_System1 number 130169 is linked to two ID_System2 numbers 1536485 and 15731010 so we can be confident that these two ID_System2 records belong to the same person. Whilst IS_System2 number 1536485 is unique, ID_System2 number 15731010 is also linked to ID_System1 number 468011 in addition to ID_System1 number 130169, so we can now deduce that ID_System1 number 130169 and 468011 are in fact the same person by virtue of sharing one instance where they had a common ID_System2 number. A further search on the newly identified ID_System1 number 468011 returns an additional ID_System2 number 19291244 which happens to be unique. From this, we can further deduce that ID_System2 numbers 15364865, 15731010 and 19291244 also relate the same person due to having rows that share the same ID_System1 numbers. Does that make sense?

Kushal_Chawda · ‎2020-07-17

got you.

Kushal_Chawda · ‎2020-07-17

do you want grouping number as described here or is it fine to get random grouping number with correct grouping. eg. for 130169 and 468011 do you want grouping as 1 only or it could be fine if randomly get 4 or 3 as well?

HonestToad · ‎2020-07-17

The actual number returned is not important as long is its unique to the individual and an individual only receives one number

Kushal_Chawda · ‎2020-07-17

try below

Data:
LOAD * ,
    if(Exists(ID_System2),1,0) as ID_System2_Flag
   INLINE
[
ID_System1,ID_System2
130169,1536485
130169,15731010
468011,15731010
468011,19291244
317480,24864038
317480,8365995
606457,8365995
340578,19051698
340578,11625124
438534,11625124
500704,18005302
500704,19514513
500668,19514513
](delimiter is ',');

Left Join(Data)
Load ID_System2,
     rangesum(Peek(Group1),ID_System2_Flag) as Group1
Resident Data
where ID_System2_Flag=1;

Left Join(Data)
Load ID_System1,
     Group1 as Group
Resident Data
where len(trim(Group1))>0;

Drop Fields Group1,ID_System2_Flag;

HonestToad · ‎2020-07-17

Wow, this is amazing and so elegant. I tried to understand your steps, but you lost me on the rangesum line - how can you use Group1 without it ever being defined?

Kushal_Chawda · ‎2020-07-17

Basically I have used Group1 within peek function to reference the already loaded previous record during the script run. Within peek you can refer the column which is being created in current load. You can read more about peek function here

Merging two separate ID systems to identify unique individuals

Concat

GROUP BY ISSUE

ID

Loop and Reduce