Re: Data De-duplication in QlikView - Qlik Community

Report Inappropriate Content · ‎2013-04-01

Hi,

I have a Customer Data in Excel that I use as a data source in QlikView, some of the customers are duplicated, and I want to de-duplicate the data when I extract it in QlikView.

My dedup have a few rules:

1. Dedup by Id

2. If there are two or more rows with the same Id, choose the one that have more complete data

For example

ID	Name	Email
1	Robert
1	John
1	Joe	joe@joe.com

Then "Robert" and "John" will be removed leaving only "Joe", how can I achieve it in my extraction script? (Edit Script?) Any suggestions or ideas?

Thanks..

Report Inappropriate Content · ‎2013-04-01

updated, try this

a:

LOAD * INLINE [

ID, NAME, EMAIL

1, Robert

1, John

1, Joe, joe@joe.com

];

LOAD Distinct

ID as id1,

NAME as name1,

EMAIL as email1

Resident a

Where EMAIL <> '' ;

Report Inappropriate Content · ‎2013-04-01

Sorry that is not what I want to achieve... the above example just to show that I want the record with more complete data (in this case the "joe" data have email and the other doesn't) to be the one selected, not just select the data that doesn't have email

somenathroy · ‎2013-04-01

Hi,

You may try with below script snipets:

AA:

LOAD *,

if(IsNull(Name) or Len(Trim(ID)) = 0,0,1) + if(IsNull(Email) or Len(Trim(Email)) = 0,0,1) as filled;

LOAD * Inline

[ID, Name, Email

1,Robert

1, John

1, Joe, joe@joe.com

2, Jou, jou@jou.com

2, Jon, abc@fo.com

];

Final:

NoConcatenate

LOAD ID,Name,Email, filled Resident AA;

Inner Join

LOAD ID, Max(filled) as filled Resident AA Group By ID;

DROP Table AA;

DROP Field filled;

Regards,

Som