Actually, your flat table structure might work better than star schema in QlikView. Qlikview, when it creates the tables in its memory via the script actually stores information in each table as bits and lookups - this is one of the ways it can reduce the size of the data. See this article here for more information.
Creating an id code and a lookup for an text field may actually use more memory than just leaving the text field in as QV has to create a "lookup" for the text, and then an additional "lookup" for the Code rather than just the one.
Because of this, often calculations work quicker if the components are all in the same table, rather reaching across lookup tables.
However obviously this depends on the rest of the data loaded, if you want to add more information to the lookups etc then this might not be the case.
After loading in the big table, you can create a lookup by loading resident from it, and using DISTINCT to get the distinct values:
LookupTable: Load distinct ShopID,ShopName resident BigTable;
This should be quicker than loading from source, as Qlikview will already have it in its memory.
Theoretically yes - and don't forget it will take more time in the script with the lookups.
I imagine that after a certain point though, the more information you need in a lookup (eg if your customer lookup needs many more fields eg tel number or address) then it may be optimal to put it in a lookup rather than the file.
See this interesting article (again, by the fab Henric) where he disproves the myth that counting distinct is slower than summing on a count within a lookup table, gives more of a feel for the issues we get with putting things into separate tables.
I forgot to mention that looping through the values of the field using fieldvalue () is even quicker than loading distinct, but I'm not sure how that would work when you need two distinct values. The way I use it (for one field) is
Load fieldvalue('FieldName',iterno()) as distinct_FieldName
this returns the text value, that qlikview has stored in the field fieldname for value number iterno()
ie for iterno() = 2 then it will return the second stored value (that it has already indexed).
NB the "len(...)" statement is important as it cuts the iterations when we have run out of values for fieldname!
I will look into this and try to test some different things.
When working with Cubes and reporting services a star schema is the best practice solution in my way - But Qlikview handles data differently, so im absolutely willing to try other methods. Ill have it a maybe return if i have some questions.
I subscribe to Erica's first suggestion. Leave everything in this one big table, and your performance will be excellent because no associative links need to be traced. Dimensions will operate just the same, whether in a separate table or embedded in the Facts table.
Only one exception: you may want to create a Master Calendar for your CreateDate field, as there may be holes in that field.
But this is only a starting point, you were saying?
My table consists of 1.7 billions rows of transactions.
My createdate should be my key to the my Mastercalendar (Maybe ill create a new key in the table without timestamp and so on) - What do you think the best solution would be?
And yes this is only a start point. This one big table contains most of the data needed, but later on there can be a new table that might need to be joined on or something else.
Yes - The calendar part i got covered
Ill sure have to look into how i do lookups and how i get the key in my "bigfacttable" instead of the real value.
maybe a load of the big table
distinct some values into a dimension
make lookup from dimension and big table and create a new "big table" and then drop the first 1 so that the new big table only consists of maybe values and keys.
To make sure that your end-users still say hello to you in the morning, consider choosing from the following:
- Aggregate whatever can be aggregated, and drop the historical records that nobody needs, or
- Take an opportunistic view to the data model, and use whatever gives you acceptable performance. For instance, do a comparison by testing a single table-model against a multiple table model with say 10 mio rows.
Indeed, a standard Master Calendar can be built and coupled to the transaction table by loading Floor([CreateDate]) AS CreateDate. If the design doesn't need a day level calendar, reduce granularity to YearMonth or YearQuarter.
Haha - We have discussed level of aggregation and sadly we need every thing on transactionlevel.
But good idea to try and split the data into large segments to see whether it would perform better. Maybe even split it out to serverel qvw documents? and then combine it into a qvd load. However i only need to do full load once, the rest can be incremental load.