Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data cleansing - reduce multiple inconsistent data to unique records

Hi

I am struggling to load some data which is not consistent and I would appreciate your help.

I have a database with 4 fields: id1, id2, description, and branch

The identification of the customer is given by the concatenation of the fields id1 and id2. So, when loading the data, I load it as:

id1&id2 as identification

Unfortunately, for the same identification (id1&id2), each branch registers the customer's name in a subjective way. Some use abbreviations, other use partial name, etc.

As I want to have a resulting table with 2 fields: identification, description (only one of the names. any of the used names suits me, so I could peek the first one or the last one for the same identification, for example), how should I load the data?

I already tried lots of options, but without any success.

for example, using the ApplyMap would require to maintain a conversion map and that is not practical.

Any ideas?

Thank you

1 Solution

Accepted Solutions
kaushiknsolanki
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

You can use the firstvalue and lastvalue function to get the result.

Data:

Load * inline [

ID,Name

1,Kaushik

1,Kau

1,KAUSHIKSOLANKI

2,SOlanki

2,SOLANKIKAUSHIK

2,Sol

];

Final:

Load ID,FirstValue(Name) as FinalName

Resident Data

Group by ID;

DROP Table Data;

Regards,

Kaushik Solanki

Please remember to hit the 'Like' button and for helpful answers and resolutions, click on the 'Accept As Solution' button. Cheers!

View solution in original post

4 Replies
kaushiknsolanki
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

You can use the firstvalue and lastvalue function to get the result.

Data:

Load * inline [

ID,Name

1,Kaushik

1,Kau

1,KAUSHIKSOLANKI

2,SOlanki

2,SOLANKIKAUSHIK

2,Sol

];

Final:

Load ID,FirstValue(Name) as FinalName

Resident Data

Group by ID;

DROP Table Data;

Regards,

Kaushik Solanki

Please remember to hit the 'Like' button and for helpful answers and resolutions, click on the 'Accept As Solution' button. Cheers!
undergrinder
Specialist II
Specialist II

Hi Rui,

you can use the autonumber function as well.

tmp:

Load

     id1,

     id2,

     name

     autonumber(id1,id2) as flg

from YourTable;

Result:

Load

     *

Resident tmp

where flg=1;

drop table tmp;

The autonumber make sequential number based on the parameters, so at the first unique appearing of the id1&id2 the result will be 1, the second unique  appearing will be 1 as well.

G.

Not applicable
Author

Hi Kaushik,

Thank you.

I applied it and it solved my problem

Regards,

Rui

Not applicable
Author

Hi G,

Thank you for your answer

I also applied it and it solved my problem

Regards,

Rui