Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
alexis
Partner - Specialist
Partner - Specialist

Help with normalization of table.

Hi,

I am a big subscriber to denormalization in QlikView and appreciate the performance advantages of using denormalized tables.

HOWEVER in my case I am dealing with transactional data which runs into millions of records and I am trying to "compact" my data model as much as I can because we are hitting performance issues.

Below is a small extract (changed the names for simplicity) from a very large file (i.e.many fields) - many of the fields are text (long strings). In the example "DeclineReason" is down to about 10 variants (obviously the data creator is picking from a pick list) - same with "CardNetwork" - when you multiply these by 10s of millions you have a lot of data duplication.

I wish to remove these fields from the Auth table and reference them using a numeric key. I started by LOAD DISTINCT these fields (see below) but I am stuck on how to replace the string field with a numeric key in my final transactional table (Auth)

Any help would be appreciated.

Alexis

Auth_tmp:

LOAD MerchantID,

     TerminalID,

     AuthorisationDate

     Amount,

     DeclineReason,

     CardNetwork

....

FROM <somedatasource>...

DeclineReasons:

LOAD DISTINCT

     DeclineReason                 As DeclineReasonDesc,

     AutoNumber(DeclineReason)     As DeclineReasonDesc_KEY

RESIDENT Auth_tmp;

CardNetworks:

LOAD DISTINCT

     CardNetwork                 As CardNetworkName,

     AutoNumber(CardNetwork)     As CardNetworkName_KEY

RESIDENT Auth_tmp;

Auth:

LOAD MerchantID,

          TerminalID,

          AuthorisationDate

          Amount,

...

NO CONCATENATE

RESIDENT Auth_tmp;

// I guess I need to DROP FIELDS DeclineReason and CardNetwork here etc..

DROP TABLE Auth_tmp;

1 Solution

Accepted Solutions
swuehl
MVP
MVP

I probably would add a qualifier to the Autonumber like

AutoNumber(DeclineReason, 'Decline')

I think it should be enough to use the same code in the Auth table then:

Auth:

NOCONATENATE

LOAD MerchantID,

          TerminalID,

          AuthorisationDate

          Amount,

          Autonumber(DeclineReason, 'Decline') as DeclineReasonDesc_KEY,


...

RESIDENT Auth_tmp;

No reason to drop fields if you leave them out already in your resident load.

But I wonder if this would make any real difference, since in your original Auth table, the values are not stored as strings, but as bit stuffed pointer to the symbol table. I think you won't gain anything.

View solution in original post

8 Replies
miguelbraga
Partner - Specialist III
Partner - Specialist III

Hi Alexis,

Can you post a snapshot sample of your data model here?

Regards,

MB

swuehl
MVP
MVP

I probably would add a qualifier to the Autonumber like

AutoNumber(DeclineReason, 'Decline')

I think it should be enough to use the same code in the Auth table then:

Auth:

NOCONATENATE

LOAD MerchantID,

          TerminalID,

          AuthorisationDate

          Amount,

          Autonumber(DeclineReason, 'Decline') as DeclineReasonDesc_KEY,


...

RESIDENT Auth_tmp;

No reason to drop fields if you leave them out already in your resident load.

But I wonder if this would make any real difference, since in your original Auth table, the values are not stored as strings, but as bit stuffed pointer to the symbol table. I think you won't gain anything.

alexis
Partner - Specialist
Partner - Specialist
Author

Hi Braga

Thanks for responding.

In the context of this example I don't have a data model - it is a SINGLE table (I called it "auth_tmp") in the example and I simply wish to remove long, repeated string fields (DeclineReason and CardNetwork) and put them in separate tables "DeclineReasons" and "CardNetworks" in the example, and link with then using a numeric key...

Any assistance would be appreciated

Alexis

Anonymous
Not applicable

From the script you have posted I can see no reason to create the DeclineReasons, CardNetworks and Auth tables as that seems to be just normalizing a table into multiple tables that is already denormalized in a single table.

As suggested earlier could you share your Data Model and that may make understanding easier.

alexis
Partner - Specialist
Partner - Specialist
Author

Hi Bill and Braga

Here is the schema I ended up with with swuehl's help. As stated this part of the puzzle is just a SINGLE table with a lot of repeated, duplicated text fields which I wanted to remove to make my main table as small as possible as I do a lot of calculations, aggregations, comparative analysis.

As swuehl seems to suggest, maybe this is a fruitless exercise as QlikView is "clever" enough to detect all the duplications and uses bit-stuffed pointers to the symbol table. I'll test it both ways (normalised and denormalised) and see if it makes a difference.

Thanks

Alexis

TestSchema.jpg

alexis
Partner - Specialist
Partner - Specialist
Author

Hi swuehl,

Thanks for your reply - as I stated in my reply to Bill, I am trying to "optimise" my application (it's many time more complex than my example 🙂 ) and I had assumed that the millions of repeated strings might not be helping but as you state maybe I'm wasting my time looking in this direction as QlikView detects this and uses pointers to a symbol table for optimisation...

Thanks again

alexis

sauliusr
Contributor II
Contributor II

If fields DeclineReason and CardNetwork have only few distinct values, there is no benefit to move them to separate table (normalize). You will not save any space or gain performance. More info:  Symbol Tables and Bit-Stuffed Pointers

alexis
Partner - Specialist
Partner - Specialist
Author

Thanks Saulius that was very helpful. I will look elsewhere for my optimisation!

rgds

Alexis