Solved: Re: Duplicate rows in qlik - Qlik Community

Maanasa

Hi, Here is my sample table, how do I eliminate duplicate rows where A--> B and B--> A , first two rows in below example needs to be considered as one single row.

Load * inline [

testID, col1, col2

1, A, B

1, B, A

1, C, A

2, D, E

3, F, H

];

thanks in advance!

Vegar

If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.

Maybe like this:

Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2

inline [

testID, col1, col2

1, A, B

1, B, A

1, C, A

2, D, E

3, F, H

];

Vegar
Qlik Community MVP

View solution in original post

MeehyeOh

Hi, @Maanasa

You mean that the first two rows in the example should be considered as one single row because "A" is the same as "B"?

Then which one of the two lines should be left?

Is there a standard you have?

Aasir

compare col1 with col2 and create a temp table with ordering.

create a final table with temp table data.

Vegar

If you want A|B to be considered equal to B|A then you should consider making the rows identical in the load script.

Maybe like this:

Load testID , rangeminstring(col1,col2) as col1,rangemaxstring(col1,col2) as col2

inline [

testID, col1, col2

1, A, B

1, B, A

1, C, A

2, D, E

3, F, H

];

Vegar
Qlik Community MVP

marksouzacosta

Quite good exercise @Maanasa

So, to eliminate the first two records you can do the following:

InputData:
LOAD
  testID, 
  col1, 
  col2,
  Hash128(col1, col2) AS Hash12,
  Hash128(col2, col1) AS Hash21
;
Load * inline [
testID, col1, col2
1, A, B
1, B, A
1, C, A
2, D, E
3, F, H
];

FinalResult:
LOAD
  testID, 
  col1, 
  col2
RESIDENT
	InputData
WHERE
	Not Exists(Hash12,Hash21)
;

DROP TABLE InputData;

This will be your end result:

Read more at Data Voyagers - datavoyagers.net

Lech_Miszkiewicz

I like your approach @marksouzacosta - well done!

cheers Lech, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful to the problem.

Rohan

Hi @Maanasa ,

The only addition to @Vegar 's script is, you will have to do is add Distinct while loading then you eliminate the duplicate row, Refer the attached image :

@marksouzacosta , your code is cool too, but it will completely eliminate the duplicates, where as atleast one of them is to be retrieved in the data. Also, we can simply do a mapping load too to achieve the same result :

Regards,

Rohan.

Maanasa

Thanks Mark!, I dont want to eliminate first two rows I want to keep one of them. this is deduping exercise. I need either one of the first two row. A-> B or B--> A.

Maanasa

Thank you so much! that actually worked. 🙂

Duplicate rows in qlik

General Question

QlikWorld 2023