Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
pranaview
Creator III
Creator III

Getting duplicate records after using Joins for three different tables

Hi All,

Requirement: I have to match the OrderID from the Table 1 in the Table 2 and then for the respective AssignmentID in the Table 2, I need to find a match in the Table 3 to complete the connection between the three tables to get the Post Date and Creation Date for each OrderID which i need for the calculations. Now I tried to apply Left Join between Table 1 and Table 2 and then again a Left Join between the resulting table and the Table 3. This works for unique values in the key fields but in real time, it is fairly common to have multiple records for the same OrderID with different Post Dates, DocNumber etc and similarly multiple records for the same AssignmentID with different Creation Date/DeliveryNumber, you get the gist of it. 

For Ex: If there are two records for the same OrderID in Table 1 and have two records in the Table 3 for the respective AssignmentID, then after applying the Joins, I will have 4 records in the resulting table rather than just 2 i.e. a duplicate for each record. What i want is to have both Creation Date and Post Date in my final table for each record in the Table 1 without any duplicates. I couldn't use ApplyMap() because of the duplicates in the Key fields(or is there a workaround?)

Below is a simple example of how the data looks and how it is connected.Please help me out here to get the desired result as without getting this done, there's no moving forward. Let me know if the problem is not clear enough

 

Table 1

OrderID        Post Date        DocNumnber       ....More Columns

1234                06/23/19          98273445

1235                06/23/19          98273450

1235                06/24/19          98273493

 

Table 2

OrderID        AssignmentID     ....More Columns

1234                 873         

1235                 874     

 

Table 3

AssignmentID        Creation Date      DeliveryNumber

873                                  06/23/19              98749335

874                                  06/23/19              98749337

874                                  06/25/19              98749342

 

Thanks!

Pranav

Labels (5)
1 Solution

Accepted Solutions
chrismarlow
Specialist II
Specialist II

Hi Pranav,

Try the script below;

20190709_1.png

Cheers,

Chris.

order_date:
LOAD * INLINE [
OrderID,Post Date,DocNumnber
1234,06/23/19,98273445
1235,06/23/19,98273450
1235,06/24/19,98273493
];

order_assignment:
LOAD * INLINE [
OrderID,AssignmentID
1234,873
1235,874
];

assignment_date:
LOAD * INLINE [
AssignmentID,Creation Date,DeliveryNumber
873,06/23/19,98749335
874,06/23/19,98749337
874,06/25/19,98749342
];



order_date_2:
NoConcatenate
Load
	If(RowNo()=1,
		1,
		If(Peek(OrderID)<>OrderID,
			1,
			Peek(Counter)+1
			)
		) AS Counter,
	*
Resident order_date
order by OrderID,[Post Date];

assignment_date_2:
NoConcatenate
Load
	If(RowNo()=1,
		1,
		If(Peek(AssignmentID)<>AssignmentID,
			1,
			Peek(Counter)+1
			)
		) AS Counter,
	*
Resident assignment_date
Order by AssignmentID,[Creation Date];
	
left join (order_date_2)
Load
	*
Resident order_assignment;

left join (order_date_2)
Load
	*
Resident assignment_date_2; 

drop tables order_date, assignment_date, order_assignment, assignment_date_2;

View solution in original post

7 Replies
chrismarlow
Specialist II
Specialist II

Hi,

The answer depends on what rule you want to apply to decide which is the right creation date.

So a solution might be that the dates should be ordered - i.e. your outcome would be;

OrderID Post DateDocNumnber AssignmentIDCreation Date
1234 06/23/19 9827344587306/23/19 
1235 06/23/19 9827345087406/23/19 
1235 06/24/19 9827349387406/25/19 

 

Or the first/last date could be chosen, and the scripting would be very different. It depends on what your business rules are, what do your users expect to see?

Cheers,

Chris.

pranaview
Creator III
Creator III
Author

Hi Chris 

I did try ordering the tables by date before joining them but it would still give me 4 records for OrderID 1235 rather than 2 after the Join(which how a Join would work) and i can't actually identify the duplicates as there's no pattern because duplicates are not restricted to just 2 or 3.So, I also thought of using AssignmentID & rowno() as a unique key but can't use this approach because Count(OrderID) won't be equal to Count(AssignmentID) in real time. I think the only way to do this is if I had another field which is unique and common between Table 1 and Table 3 which i don't have unfortunately.

Regarding the rule : So you're right like for duplicates, OrderID with the oldest Post Date should be mapped with the AssignmentID with the oldest Creation Date

Thanks for the suggestion though.

Pranav

chrismarlow
Specialist II
Specialist II

Hi Pranav,

As your dates can be different then you need to add a counter to the tables with dates using an if/peek/rowno pattern that resets with each set of IDs.

You then make the counter part of your join so join on combination of counter and ID.

There will be an example somewhere on community, or I will dig one out for you.

Cheers,

Chris.

chrismarlow
Specialist II
Specialist II

Hi Pranav,

Try the script below;

20190709_1.png

Cheers,

Chris.

order_date:
LOAD * INLINE [
OrderID,Post Date,DocNumnber
1234,06/23/19,98273445
1235,06/23/19,98273450
1235,06/24/19,98273493
];

order_assignment:
LOAD * INLINE [
OrderID,AssignmentID
1234,873
1235,874
];

assignment_date:
LOAD * INLINE [
AssignmentID,Creation Date,DeliveryNumber
873,06/23/19,98749335
874,06/23/19,98749337
874,06/25/19,98749342
];



order_date_2:
NoConcatenate
Load
	If(RowNo()=1,
		1,
		If(Peek(OrderID)<>OrderID,
			1,
			Peek(Counter)+1
			)
		) AS Counter,
	*
Resident order_date
order by OrderID,[Post Date];

assignment_date_2:
NoConcatenate
Load
	If(RowNo()=1,
		1,
		If(Peek(AssignmentID)<>AssignmentID,
			1,
			Peek(Counter)+1
			)
		) AS Counter,
	*
Resident assignment_date
Order by AssignmentID,[Creation Date];
	
left join (order_date_2)
Load
	*
Resident order_assignment;

left join (order_date_2)
Load
	*
Resident assignment_date_2; 

drop tables order_date, assignment_date, order_assignment, assignment_date_2;
pranaview
Creator III
Creator III
Author

Hi Chris,

Thanks for the script man!

I tested it for all the possible scenarios i could think of and it hold up good.

Thanks for saving the day.

Pranav

pam1990
Contributor III
Contributor III

@chrismarlow  can you explain why this was the route to take and what it is doing?

I think I can use this for a similar problem, but I need to better understand what is happening since I have more fields than the original issue.

Thanks

chrismarlow
Specialist II
Specialist II

Hi,

So crux of this is there is an allocation question where we have records in order_date and assignment_date where we would like to match off against each other but there is no unique key to join on.

It looks a bit more complicated than it is as there is also order_assignment as a mapping, but as that is one to one, so other than apply the mapping we don't need to worry about it.

So in the back & forth we established that an acceptable business rule was to order by date in this case, so for each set or OrderID (equivalent to AssignmentID) the first Post Date would be matched with the first Creation Date and so on.

It is easier to do this by creating a counter, so marking that first pair on each side with 1 etc, than trying to do something with the dates first. The preceeding load (with the If/RowNo/Peek) and sorting adds these to the two sides & then can be joined.

If you drop different sets of tables at the end then the intermediate tables might help show what is going on better than I've explained it. I would say take the drop statement out, but QlikView then does something horrible with synthetic keys that makes it harder to follow 😀.

Shout if that does not help.

Cheers,

Chris.