Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a Customer Data in Excel that I use as a data source in QlikView, some of the customers are duplicated, and I want to de-duplicate the data when I extract it in QlikView.
My dedup have a few rules:
1. Dedup by Id
2. If there are two or more rows with the same Id, choose the one that have more complete data
For example
ID | Name | |
---|---|---|
1 | Robert | |
1 | John | |
1 | Joe | joe@joe.com |
Then "Robert" and "John" will be removed leaving only "Joe", how can I achieve it in my extraction script? (Edit Script?) Any suggestions or ideas?
Thanks..
updated, try this
a:
LOAD * INLINE [
ID, NAME, EMAIL
1, Robert
1, John
1, Joe, joe@joe.com
];
LOAD Distinct
ID as id1,
NAME as name1,
EMAIL as email1
Resident a
Where EMAIL <> '' ;
Sorry that is not what I want to achieve... the above example just to show that I want the record with more complete data (in this case the "joe" data have email and the other doesn't) to be the one selected, not just select the data that doesn't have email
Hi,
You may try with below script snipets:
AA:
LOAD *,
if(IsNull(Name) or Len(Trim(ID)) = 0,0,1) + if(IsNull(Email) or Len(Trim(Email)) = 0,0,1) as filled;
LOAD * Inline
[ID, Name, Email
1,Robert
1, John
1, Joe, joe@joe.com
2, Jou, jou@jou.com
2, Jon, abc@fo.com
];
Final:
NoConcatenate
LOAD ID,Name,Email, filled Resident AA;
Inner Join
LOAD ID, Max(filled) as filled Resident AA Group By ID;
DROP Table AA;
DROP Field filled;
Regards,
Som