Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
qlikuser233
Contributor II
Contributor II

Join data duplicating records

I am joining two data sources, the issue is that I am brining in certain columns for each and they share 4 columns with the primary key being unique identifier. The columns that are missing from Tracker 2 are populating as 0's instead of merging into one line.

Tracker: 
LOAD  Client, 
     [Client ID], 
     [Team Cd], 
     [Policy/Contract Effective Date], 
     [Policy/Contract Expiration Date], 
     [Pol/Cont Num], 
     [Captive or Non-Captive],
     [Policy/Contract Effective Date Conversion], 
     Adjustment,
     [Client ID]&[Pol/Cont Num]&[Policy/Contract Effective Date] as [Unique Identifier], 
[3133]
     [3137]
 
Outer join (Tracker)
 
Tracker_2:
LOAD A as [Client ID], 
     B as [Client], 
   C as [Pol/Cont Num]
     D as [Policy/Contract Effective Date],
     I as [Unique Identifier], 
Z as [Timing - Waiting for Registration (New/RNW or Adjustment) - Premium - COU, PAU, ACR],

Attached is a photo for one record. It splits into 2 lines, the top line is coming from tracker 2 as it has 0's where it should be filling in the information from Tracker. I have crossed out the unique identifier number but they are identical so I cannot figure out why this is creating a line for each source rather than merging them when the primary key for both sources is identical. 

Labels (1)
1 Solution

Accepted Solutions
marcus_sommer

Like @Or mentioned a join is performed against all key-fields which means that all field-values of them must be matching.

Further missing keys and/or any duplicates will have an impact on the resulting data-set and may adding respectively removing records (depending on the applied join-type). To get rid of it could be quite difficult especially if there is not enough knowledge and control in regard to the data-quality. Therefore I suggest to change the join-logic to a mapping which is often easier, more flexible and performant and without any risk to change the number of records:

Don't join - use Applymap instead - Qlik Community - 1467592

View solution in original post

12 Replies
Or
MVP
MVP

You marked out the exact parts needed to try and see what's going on, so having to go on a guess here, but... keep in mind that joins are based on all fields with identical names. There's no concept of "primary key" in a Qlik based join.

qlikuser233
Contributor II
Contributor II
Author

Everything I marked out is identical but I can redo it with sample data: 

Client ID Client Pol/Cont Num effective date Unique ID 3033 3137  
123 Crumbl 456 999 123456999 0 0  
123 Crumbl 456 999 123456999 1 587  


so 3033 and 3137 are coming from Tracker and I would expect to see one line here but it is creating 2. @Or 

qlikuser233
Contributor II
Contributor II
Author

Client ID, Pol/Cont Num, effective date and Unique ID are all shared fields but it should only be using Unique ID to match. The rest of the columns have potential duplicates so how can I force it to match based only on unique ID? @Or 

Or
MVP
MVP

These don't seem to match up - you create the unique ID by concatenating a date, but in this case it's clearly not based on that date? It's also missing at least one of the fields that has an identical name, [Policy/Contract Effective Date], which in your screen capture is showing different values between rows.

 

Or
MVP
MVP

You would want to change the field names on one side - presumably on the outer side - for anything that isn't supposed to match. Otherwise, I don't see how you'd expect to end up with one row when the field has two potential values. You could add a second run with a Group By to handle the doubled rows in whichever way you want to, though.

qlikuser233
Contributor II
Contributor II
Author

 m

Or
MVP
MVP

Again, you can't join two tables which have different values in a column and expect anything other than duplication. This is the correct result. If you want to achieve a different result, you need to use a Group By function to deal with the "duplication" and tell Qlik how you want to pick a value between the two options, or rename the columns so they don't share the same name.

marcus_sommer

You are not looking really on the table-data else on an UI view. This means the extra row might not come from the join-statement else from another association between your data.

Just comment everything else from your script unless this join-stuff. Further adding a recno() and a rowno() to the load will be quite helpful to track what happens - and then put everything in a table-box and not a table-chart. If it results really in any duplicates it would mean that there are duplicates within the sources and/or that the key-values are exactly matching together. For example identically looking data must not mandatory be equally like a real date and a per date() formatted timestamp ...

qlikuser233
Contributor II
Contributor II
Author

They don't have different values in the columns. The shared columns in each source contain identical values. The only thing I am trying to do is add columns on to the Tracker table using Unique Identifier. If I rename the columns then that would just create further duplication. Why would it not use the unique ID to pull in info from the other columns? 

 These columns are identical in both A as [Client ID], 
     C as [Pol/Cont Num],
     D as [Policy/Contract Effective Date],
     I as [Unique Identifier],  
The only new columns I need from Tracker 2 are from V-AA. @Or  If I am misunderstanding how would I use group by with the ID to achieve this?