Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, Here is my sample table, how do I eliminate duplicate rows where A--> B and B--> A , first two rows in below example needs to be considered as one single row.
Load * inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
thanks in advance!
If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.
Maybe like this:
Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2
inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
Hi, @Maanasa
You mean that the first two rows in the example should be considered as one single row because "A" is the same as "B"?
Then which one of the two lines should be left?
Is there a standard you have?
compare col1 with col2 and create a temp table with ordering.
create a final table with temp table data.
If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.
Maybe like this:
Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2
inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
Quite good exercise @Maanasa
So, to eliminate the first two records you can do the following:
InputData:
LOAD
testID,
col1,
col2,
Hash128(col1, col2) AS Hash12,
Hash128(col2, col1) AS Hash21
;
Load * inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];
FinalResult:
LOAD
testID,
col1,
col2
RESIDENT
InputData
WHERE
Not Exists(Hash12,Hash21)
;
DROP TABLE InputData;
This will be your end result:
I like your approach @marksouzacosta - well done!
Hi @Maanasa ,
The only addition to @Vegar 's script is, you will have to do is add Distinct while loading then you eliminate the duplicate row, Refer the attached image :
@marksouzacosta , your code is cool too, but it will completely eliminate the duplicates, where as atleast one of them is to be retrieved in the data. Also, we can simply do a mapping load too to achieve the same result :
Regards,
Rohan.
Thanks Mark!, I dont want to eliminate first two rows I want to keep one of them. this is deduping exercise. I need either one of the first two row. A-> B or B--> A.
Thank you so much! that actually worked. 🙂