Qlik Community

QlikView App Development

Discussion Board for collaboration related to QlikView App Development.

Not applicable

Data De-duplication in QlikView

Hi,

I have a Customer Data in Excel that I use as a data source in QlikView, some of the customers are duplicated, and I want to de-duplicate the data when I extract it in QlikView.

My dedup have a few rules:

1. Dedup by Id

2. If there are two or more rows with the same Id, choose the one that have more complete data

For example

IDNameEmail

1

Robert
1John
1Joejoe@joe.com

Then "Robert" and "John" will be removed leaving only "Joe", how can I achieve it in my extraction script? (Edit Script?) Any suggestions or ideas?

Thanks..

Tags (2)
3 Replies
Not applicable

Re: Data De-duplication in QlikView

updated, try this

a:

LOAD * INLINE [

    ID, NAME, EMAIL

    1, Robert

    1, John

    1, Joe, joe@joe.com

];

LOAD Distinct

ID as id1,

NAME as name1,

  EMAIL as email1

Resident a

Where EMAIL <> '' ;

Not applicable

Re: Data De-duplication in QlikView

Sorry that is not what I want to achieve... the above example just to show that I want the record with more complete data (in this case the "joe" data have email and the other doesn't) to be the one selected, not just select the data that doesn't have email

somenathroy
Contributor III

Re: Data De-duplication in QlikView

Hi,

You may try with below script snipets:

AA:

LOAD *,

if(IsNull(Name) or Len(Trim(ID)) = 0,0,1) + if(IsNull(Email) or Len(Trim(Email)) = 0,0,1) as filled;

LOAD * Inline

[ID,          Name,          Email

1,Robert

1,          John

1,          Joe,          joe@joe.com

2,          Jou,          jou@jou.com

2,          Jon,    abc@fo.com 

];

Final:

NoConcatenate

LOAD ID,Name,Email, filled Resident AA;

Inner Join

LOAD ID, Max(filled) as filled Resident AA Group By ID;

DROP Table AA;

DROP Field filled;

Regards,

Som