Re: Normalization of Data - Qlik Community

Report Inappropriate Content · ‎2016-02-09

Hi

we are loading the data from multiple sources. how to avoid the duplicates while loading the raw data??

Below is the sample data:

ID Name

0111 IBM

0111 VIBM

0111 IBM CORPORATION

0422 V-SS&C TECHNOLOGIES INC

0422 V-SSC TECHNOLOGIES INC

0171 V-V-RECORDS MANAGEMENT INC

3674 V-EMC CORPORATION

3199 V- AMERICA INTERNATIONAL CORPORATION

3789 V-AIRTEL INC

3789 V-V-AIRTEL INC

sunny_talwar · ‎2016-02-09

How would you decide which Name to load. For example When ID = 0111 which Name would go? IBM, VIBM or IBM CROPORATION?

swuehl · ‎2016-02-09

Have a look at

Data Cleansing

In your case, create a table with customer agreed Names for each ID, then e.g. MAP the values when loading your source data.

Report Inappropriate Content · ‎2016-02-09

I have 13656 rows data. I have to create manually????

Report Inappropriate Content · ‎2016-02-09

In this case IBM is the root data for loading.

sunny_talwar · ‎2016-02-09

We need to have consistency. If it is always going to be the first name then you can use FirstValue() function. Something like this:

LOAD ID,

FirstValue(Name) as Name

FROM Source

Group By ID;

swuehl · ‎2016-02-09

Kishore Kumar Karakavalasa wrote:

I have 13656 rows data. I have to create manually????

Hopefully, there is a table with agreed Names for each ID somewhere in your systems (call it a master dimension table or leading data table for that dimension), so you can just load this table as dimension table in QV.

Peter_Cammaert · ‎2016-02-09

Or, to keep it simple: decide which one of your sources is "leading". That will probably be the data source that:

contains the most ID-Name combinations (consider adding missing ones)
is easy to maintain
has the fewest incorrect Name values (consider correcting the ones that are wrong)

Solutions to merge master data tables from different systems rely either on

one system being "boss" and being maintained as such
or some kind of mapping table like Stefan suggested
or a ruleset to guide the reload engine when decisions have to be made about which Name to retain.

Peter