Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
Am I correct in understanding that if I use AutoNumber on a key field, for instance
AutoNumber(ProductID&CustomerID)
that the symbol table for the Key Field "ProductID&CustomerID" would look like this?
Pointer Value
1
2
that is, that the Value is empty because a look-up is not needed? Or would the symbol table look like this?
Pointer Value
1 1
2 2
Also is there a way to view the symbol tables? Thank you.
It would look like neither. When a field contains only sequential integers, the index value in the table row is used as the value. So there is no symbol table. I think you might be saying this in your first example.
-Rob
Yes the Value becomes a simple integer, and the original concatenated string is no longer stored in the field.
Or try this
AutoNumberHash128(ProductID, CustomerID)
It would look like neither. When a field contains only sequential integers, the index value in the table row is used as the value. So there is no symbol table. I think you might be saying this in your first example.
-Rob
Thanks for your replies!
I did not realize there would be no symbol table at all. But it makes sense.
Should we always use AutoNumber on key fields? Or is there a drawback?
I recommend always using the AutoNumber on key fields.
I would also recommend using the AutoNumber statement rather than the AutoNumber() function. Just put the statement at the end of your script. Much easier and much faster.
-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com
Yes good point, I suppose this helps with optimized loads aswell
A related question to the topic of symbol tables.
Let's say I have 1M rows in the Fact table and a column "A" with 100 distinct values. So the data is repeating a lot for this column. Is this a problem because the symbol table will still only store the 100 distinct values?
Will it make a difference if I remove the column "A" from the Fact table, performance wise, and instead only store the 100 distinct values of A in a dimension table that is associated with the Fact table through some key field?
I guess there should be some performance advantage otherwise why would we not store all the fields in a wide fact table, instead of having dimension tables?
IMO it's rather unlikely that the outsourcing of fact-fields into extra dimension-tables will have benefits in regard to the performance. Within the most scenarios the opposite will happens.
Especially with huge data-sets and/or if the UI performance is the biggest bottleneck it might be helpful to merge all information into a single big table. That's rather seldom necessary and might be addressed after a working star-scheme data-model (officially recommended as best compromise between script/UI performance but also development/maintainability efforts) was carefully optimized.
You are correct that the symbol table will only store 100 values in either the Fact or Dimension case. The symbol table size is the same.
In terms of optimizing storage, placing a field in fact or dimension table is driven by how many fields could be in the dimension table.
In the fact table, you have 1M pointers for A. If you move A to a dimension table by itself you still have 1M pointers for A in the fact table. You will also have 100 pointers in the dimension table. This is inefficient and it's better to leave A in the fact.
If A were to move to a dimension table that contained multiple fields, there can be storage savings. For example imagine a set of ten "Customer" fields (Name, address, etc) with 100 values each. If placed in the fact, that be would 1M pointers * 10 fields. If moved to a dimension table, there would be only 1M pointers in the fact for the key field, and 100 pointers * 11 fields in the Customers dimension table.
-Rob
Thank you both for the replies, very insightful.
So I guess both the number of pointers and pointer size (and thereby the number of rows in symbol tables) matters in terms of performance.
But for fields that do not belong in dimension tables, for instance "Category" with no extra attributes. If you have 1M rows and 100 distinct values for "Category", how does the symbol table help compress this field?
You will then have 1M total pointers in the Fact (Data) table and 100 distinct pointers in the symbol table for "Category". I guess it takes less storage to store in this case a pointer of size 7, than the actual field values for "Category"?