Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Normalization of Data

Hi

we are loading the data from multiple sources. how to avoid the duplicates while loading the raw data??

Below is the sample data:

ID         Name

0111    IBM

0111    VIBM

0111    IBM CORPORATION

0422    V-SS&C TECHNOLOGIES INC

0422    V-SSC TECHNOLOGIES INC

0171     V-V-RECORDS MANAGEMENT INC

3674     V-EMC CORPORATION

3199     V- AMERICA INTERNATIONAL CORPORATION

3789    V-AIRTEL INC

3789    V-V-AIRTEL INC

7 Replies
sunny_talwar

How would you decide which Name to load. For example When ID = 0111 which Name would go? IBM, VIBM or IBM CROPORATION?

swuehl
MVP
MVP

Have a look at

Data Cleansing

In your case, create a table with customer agreed Names for each ID, then e.g. MAP the values when loading your source data.

Not applicable
Author

I have 13656 rows data. I  have to create manually????

Not applicable
Author

In this case IBM is the root data for loading.

sunny_talwar

We need to have consistency. If it is always going to be the first name then you can use FirstValue() function. Something like this:

LOAD ID,

          FirstValue(Name) as Name

FROM Source

Group By ID;

swuehl
MVP
MVP

Kishore Kumar Karakavalasa wrote:

I have 13656 rows data. I  have to create manually????

Hopefully, there is a table with agreed Names for each ID somewhere in your systems (call it a master dimension table or leading data table for that dimension), so you can just load this table as dimension table in QV.

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Or, to keep it simple: decide which one of your sources is "leading". That will probably be the data source that:

  • contains the most ID-Name combinations (consider adding missing ones)
  • is easy to maintain
  • has the fewest incorrect Name values (consider correcting the ones that are wrong)

Solutions to merge master data tables from different systems rely either on

  • one system being "boss" and being maintained as such
  • or some kind of mapping table like Stefan suggested
  • or a ruleset to guide the reload engine when decisions have to be made about which Name to retain.

Peter