Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
 Maanasa
		
			Maanasa
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi, Here is my sample table, how do I eliminate duplicate rows where A--> B and B--> A , first two rows in below example needs to be considered as one single row.
Load * inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
thanks in advance!
 Vegar
		
			Vegar
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.
Maybe like this:
Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2
inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
 MeehyeOh
		
			MeehyeOh
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi, @Maanasa
You mean that the first two rows in the example should be considered as one single row because "A" is the same as "B"?
Then which one of the two lines should be left?
Is there a standard you have?
 Aasir
		
			Aasir
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		compare col1 with col2 and create a temp table with ordering.
create a final table with temp table data.
 Vegar
		
			Vegar
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.
Maybe like this:
Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2
inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
 marksouzacosta
		
			marksouzacosta
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Quite good exercise @Maanasa
So, to eliminate the first two records you can do the following:
InputData:
LOAD
  testID, 
  col1, 
  col2,
  Hash128(col1, col2) AS Hash12,
  Hash128(col2, col1) AS Hash21
;
Load * inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
FinalResult:
LOAD
  testID, 
  col1, 
  col2
RESIDENT
	InputData
WHERE
	Not Exists(Hash12,Hash21)
;
DROP TABLE InputData;
This will be your end result:
Read more at Data Voyagers - datavoyagers.net
Follow me on my LinkedIn | Know IPC Global at ipc-global.com
 Lech_Miszkiewic
		
			Lech_MiszkiewicI like your approach @marksouzacosta - well done!
 Rohan
		
			Rohan
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi @Maanasa ,
The only addition to @Vegar 's script is, you will have to do is add Distinct while loading then you eliminate the duplicate row, Refer the attached image :
@marksouzacosta , your code is cool too, but it will completely eliminate the duplicates, where as atleast one of them is to be retrieved in the data. Also, we can simply do a mapping load too to achieve the same result :
Regards,
Rohan.
 Maanasa
		
			Maanasa
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Thanks Mark!, I dont want to eliminate first two rows I want to keep one of them. this is deduping exercise. I need either one of the first two row. A-> B or B--> A.
 Maanasa
		
			Maanasa
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Thank you so much! that actually worked. 🙂
