Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
shaan007
Partner - Creator
Partner - Creator

How to flag all Duplicates of one field in Script

Hi,

I am working on data Quality issues.

for filtering purpose I need to flag all duplicate records from one particulate field.


I used a method described below.


Is there any other method- better - exist?

---------------------------------------------------------------------------
Flagging Duplicates of Field -  [Key]


data:

LOAD * Inline [

ID, Key, Flag, Date

1, A1, 1, 10

2, A1, 1, 20

3, A2, 1, 10

4, A2, 1, 30

5, A3, 1, 04

6, A4, 1, 30

7, A1, 1, 10

8, A5, 1, 30

9, A1, 1, 04

];

Map_Duplicate:

Mapping load

Key,

sum(Flag) as Frq

Resident data Group by  Key;

data2:

load *,

if(  ApplyMap('Map_Duplicate',Key) >1,'Yes', 'No') as [Is Duplicate]

Resident data;

drop Table data;

2014_11_17_19_06_07_QlikView_x64_C_Users_shaanb_Desktop_Extra_training_Flag_Duplicates_of_One_Fi.png

Message was edited by: Prasanna Balachandran

6 Replies
anbu1984
Master III
Master III

Can you explain your requirement. What is expected output?

tresesco
MVP
MVP

May be a simpler way:

data:
LOAD * Inline [
ID, Key, Flag, Date
1, A1, 1, 10
2, A1, 1, 20
3, A2, 1, 10
4, A2, 1, 30
5, A3, 1, 04
6, A4, 1, 30
7, A1, 1, 10
8, A5, 1, 30
9, A1, 1, 04
];
Join
Map_Duplicate:
//Mapping load
Load

Key,
//sum(Flag) as Frq,
If(Count(Key)>1, 'Yes', 'No') as Duplicate
Resident data Group by  Key;


//data2:
//load *,
//if(ApplyMap('Map_Duplicate',Key)>1,'Yes', 'No') as Duplicate
//
//
//Resident data;
//
//drop Table data;

shaan007
Partner - Creator
Partner - Creator
Author

for filtering purpose I need to flag all duplicate records

shaan007
Partner - Creator
Partner - Creator
Author

Excellent , Thank you

shaan007
Partner - Creator
Partner - Creator
Author

I think it should be Count(ID)>1

...

Join
Duplicate:

Load

Key,
If(Count(ID)>1, 'Yes', 'No') as Duplicate
Resident data Group by  Key;

...

anbu1984
Master III
Master III

Count(Non_nullable_field)

You can use any non nullable field. You will get same result whether you use Count(ID) or Count(Key)