Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data De-duplication in QlikView

Hi,

I have a Customer Data in Excel that I use as a data source in QlikView, some of the customers are duplicated, and I want to de-duplicate the data when I extract it in QlikView.

My dedup have a few rules:

1. Dedup by Id

2. If there are two or more rows with the same Id, choose the one that have more complete data

For example

IDNameEmail

1

Robert
1John
1Joejoe@joe.com

Then "Robert" and "John" will be removed leaving only "Joe", how can I achieve it in my extraction script? (Edit Script?) Any suggestions or ideas?

Thanks..

3 Replies
Not applicable
Author

updated, try this

a:

LOAD * INLINE [

    ID, NAME, EMAIL

    1, Robert

    1, John

    1, Joe, joe@joe.com

];

LOAD Distinct

ID as id1,

NAME as name1,

  EMAIL as email1

Resident a

Where EMAIL <> '' ;

Not applicable
Author

Sorry that is not what I want to achieve... the above example just to show that I want the record with more complete data (in this case the "joe" data have email and the other doesn't) to be the one selected, not just select the data that doesn't have email

somenathroy
Creator III
Creator III

Hi,

You may try with below script snipets:

AA:

LOAD *,

if(IsNull(Name) or Len(Trim(ID)) = 0,0,1) + if(IsNull(Email) or Len(Trim(Email)) = 0,0,1) as filled;

LOAD * Inline

[ID,          Name,          Email

1,Robert

1,          John

1,          Joe,          joe@joe.com

2,          Jou,          jou@jou.com

2,          Jon,    abc@fo.com 

];

Final:

NoConcatenate

LOAD ID,Name,Email, filled Resident AA;

Inner Join

LOAD ID, Max(filled) as filled Resident AA Group By ID;

DROP Table AA;

DROP Field filled;

Regards,

Som