Solved: Data cleansing - reduce multiple inconsistent data... - Qlik Community

Report Inappropriate Content · ‎2016-03-06

Hi

I am struggling to load some data which is not consistent and I would appreciate your help.

I have a database with 4 fields: id1, id2, description, and branch

The identification of the customer is given by the concatenation of the fields id1 and id2. So, when loading the data, I load it as:

id1&id2 as identification

Unfortunately, for the same identification (id1&id2), each branch registers the customer's name in a subjective way. Some use abbreviations, other use partial name, etc.

As I want to have a resulting table with 2 fields: identification, description (only one of the names. any of the used names suits me, so I could peek the first one or the last one for the same identification, for example), how should I load the data?

I already tried lots of options, but without any success.

for example, using the ApplyMap would require to maintain a conversion map and that is not practical.

Any ideas?

Thank you

kaushiknsolanki · ‎2016-03-07

Hi,

You can use the firstvalue and lastvalue function to get the result.

Data:

Load * inline [

ID,Name

1,Kaushik

1,Kau

1,KAUSHIKSOLANKI

2,SOlanki

2,SOLANKIKAUSHIK

2,Sol

];

Final:

Load ID,FirstValue(Name) as FinalName

Resident Data

Group by ID;

DROP Table Data;

Regards,

Kaushik Solanki

Please remember to hit the 'Like' button and for helpful answers and resolutions, click on the 'Accept As Solution' button. Cheers!

View solution in original post

kaushiknsolanki · ‎2016-03-07

Hi,

You can use the firstvalue and lastvalue function to get the result.

Data:

Load * inline [

ID,Name

1,Kaushik

1,Kau

1,KAUSHIKSOLANKI

2,SOlanki

2,SOLANKIKAUSHIK

2,Sol

];

Final:

Load ID,FirstValue(Name) as FinalName

Resident Data

Group by ID;

DROP Table Data;

Regards,

Kaushik Solanki

Please remember to hit the 'Like' button and for helpful answers and resolutions, click on the 'Accept As Solution' button. Cheers!

undergrinder · ‎2016-03-07

Hi Rui,

you can use the autonumber function as well.

tmp:

Load

id1,

id2,

name

autonumber(id1,id2) as flg

from YourTable;

Result:

Load

*

Resident tmp

where flg=1;

drop table tmp;

The autonumber make sequential number based on the parameters, so at the first unique appearing of the id1&id2 the result will be 1, the second unique appearing will be 1 as well.

G.

Report Inappropriate Content · ‎2016-03-07

Hi Kaushik,

Thank you.

I applied it and it solved my problem

Regards,

Rui

Report Inappropriate Content · ‎2016-03-07

Hi G,

Thank you for your answer

I also applied it and it solved my problem

Regards,

Rui

Data cleansing - reduce multiple inconsistent data to unique records