Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

flat table structur to star schema

Hello,

I wanna know what the best practices is to make a star schema in qlikview when your start point is one big flat table?

Eg I have this table loaded into qlikview:

ENTRYID,CreateDate,ShopID,ShopName,CustomerID,CustomerName,Amount,Taxes

When is the best practices to split this up in Dimensions and facts?

Lets say i want to create a ShopDimension And a CustomerDimension and a DateDimension

Would i then resident load from the big table and just make a surrogate key? Or what do you mean is the best?

13 Replies
Not applicable
Author

Hi Thomas

Actually, your flat table structure might work better than star schema in QlikView. Qlikview, when it creates the tables in its memory via the script actually stores information in each table as bits and lookups - this is one of the ways it can reduce the size of the data. See this article here for more information.

http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers

Creating an id code and a lookup for an text field may actually use more memory than just leaving the text field in as QV has to create a "lookup" for the text, and then an additional "lookup" for the Code rather than just the one.

Because of this, often calculations work quicker if the components are all in the same table, rather reaching across lookup tables.

However obviously this depends on the rest of the data loaded, if you want to add more information to the lookups etc then this might not be the case.

Anyway

After loading in the big table, you can create a lookup by loading resident from it, and using DISTINCT to get the distinct values:

LookupTable: Load distinct ShopID,ShopName resident BigTable;

This should be quicker than loading from source, as Qlikview will already have it in its memory.

Erica

Not applicable
Author

Hi Erica,

Thanks for the inputs. I see your points and i will try it out

But basically qlikview loads faster cause it can compress the data in the memory and lookup tables will be more memory consuming due to the lookups?

Not applicable
Author

I forgot to mention that looping through the values of the field using fieldvalue () is even quicker than loading distinct, but I'm not sure how that would work when you  need two distinct  values. The way I use it (for one field) is

Lu_Field:

Load fieldvalue('FieldName',iterno()) as distinct_FieldName

Autogenerate 1

while len(fieldValue('FieldName',iterno()));

this returns the text value, that qlikview has stored in the field fieldname for value number iterno()

ie for iterno() = 2 then it will return the second stored value (that it has already indexed).

NB the "len(...)" statement is important as it cuts the iterations when we have run out of values for fieldname!

Regards,

Erica

Not applicable
Author

Theoretically yes - and don't forget it will take more time in the script with the lookups.

I imagine that after a certain point though, the more information you need in a lookup (eg if your customer lookup needs many more fields eg tel number or address) then it may be optimal to put it in a lookup rather than the file.

See this interesting article (again, by the fab Henric) where he disproves the myth that counting distinct is slower than summing on a count within a lookup table, gives more of a feel for the issues we get with putting things into separate tables.

http://community.qlik.com/blogs/qlikviewdesignblog/2013/10/22/a-myth-about-countdistinct

Erica


Not applicable
Author

Thank Erica,

I will look into this and try to test some different things.

When working with Cubes and reporting services a star schema is the best practice solution in my way - But Qlikview handles data differently, so im absolutely willing to try other methods. Ill have it a maybe return if i have some questions.

Peter_Cammaert
Partner - Champion III
Partner - Champion III

I subscribe to Erica's first suggestion. Leave everything in this one big table, and your performance will be excellent because no associative links need to be traced. Dimensions will operate just the same, whether in a separate table or embedded in the Facts table.

Only one exception: you may want to create a Master Calendar for your CreateDate field, as there may be holes in that field.

But this is only a starting point, you were saying?

Best,

Peter

Not applicable
Author

Yes it would - In my mind a star schema is just easier to read cause you have a datamodel that links to your fact with all your keys needed and thereby you have data connected

Not applicable
Author

My table consists of 1.7 billions rows of transactions.

My createdate should be my key to the my Mastercalendar (Maybe ill create a new key in the table without timestamp and so on) - What do you think the best solution would be?

And yes this is only a start point. This one big table contains most of the data needed, but later on there can be a new table that might need to be joined on or something else.

Not applicable
Author

I am used to working in SQL, so in my mind a star schema is intuitive to me. But QV works so different to a "traditional" database so these assumptions dont always hold!