Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
QlikWilliam
Contributor II
Contributor II

Structure of Symbol Tables using AutoNumber on Key fields

Hello,

Am I correct in understanding that if I use AutoNumber on a key field, for instance

AutoNumber(ProductID&CustomerID)

that the symbol table for the Key Field "ProductID&CustomerID" would look like this?

Pointer   Value

1

2

that is, that the Value is empty because a look-up is not needed? Or would the symbol table look like this?

Pointer   Value

1             1

2             2

Also is there a way to view the symbol tables? Thank you. 

Labels (2)
1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It would look like neither. When a field contains only sequential integers, the index value in the table row is used as the value. So there is no symbol table. I think you might be saying this in your first example.

-Rob

View solution in original post

12 Replies
Chanty4u
MVP
MVP

Yes the Value becomes a simple integer, and the original concatenated string is no longer stored in the field.

 

Or  try this 

AutoNumberHash128(ProductID, CustomerID)

 

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It would look like neither. When a field contains only sequential integers, the index value in the table row is used as the value. So there is no symbol table. I think you might be saying this in your first example.

-Rob

QlikWilliam
Contributor II
Contributor II
Author

Thanks for your replies!

I did not realize there would be no symbol table at all. But it makes sense.

Should we always use AutoNumber on key fields? Or is there a drawback?

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I recommend always using the AutoNumber on key fields.

I would also recommend using the AutoNumber statement rather than the AutoNumber() function. Just put the statement at the end of your script. Much easier and much faster.

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

Qlik_William
Contributor III
Contributor III

Yes good point, I suppose this helps with optimized loads aswell

Qlik_William
Contributor III
Contributor III

A related question to the topic of symbol tables.

Let's say I have 1M rows in the Fact table and a column "A" with 100 distinct values. So the data is repeating a lot for this column. Is this a problem because the symbol table will still only store the 100 distinct values?

Will it make a difference if I remove the column "A" from the Fact table, performance wise, and instead only store the 100 distinct values of A in a dimension table that is associated with the Fact table through some key field?

I guess there should be some performance advantage otherwise why would we not store all the fields in a wide fact table, instead of having dimension tables?

@rwunderlich @Chanty4u 

marcus_sommer

IMO it's rather unlikely that the outsourcing of fact-fields into extra dimension-tables will have benefits in regard to the performance. Within the most scenarios the opposite will happens.

Especially with huge data-sets and/or if the UI performance is the biggest bottleneck it might be helpful to merge all information into a single big table. That's rather seldom necessary and might be addressed after a working star-scheme data-model (officially recommended as best compromise between script/UI performance but also development/maintainability efforts) was carefully optimized. 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

You are correct that the symbol table will only store 100 values in either the Fact or Dimension case. The symbol table size is the same. 

In terms of optimizing storage, placing a field in fact or dimension table is driven by how many fields could be in the dimension table. 

In the fact table, you have 1M pointers for A. If you move A to a dimension table by itself you still have 1M pointers for A in the fact table. You will also have 100 pointers in the dimension table. This is inefficient and it's better to leave A in the fact.

If A were to move to a dimension table that contained multiple fields, there can be storage savings. For example imagine a set of ten "Customer" fields (Name, address, etc) with 100 values each. If placed in the fact, that be would 1M pointers * 10 fields. If moved to a dimension table, there would be only 1M pointers in the fact for the key field, and 100 pointers * 11 fields in the Customers dimension table.

-Rob

 

Qlik_William
Contributor III
Contributor III

Thank you both for the replies, very insightful.

So I guess both the number of pointers and pointer size (and thereby the number of rows in symbol tables) matters in terms of performance.

But for fields that do not belong in dimension tables, for instance "Category" with no extra attributes. If you have 1M rows and 100 distinct values for "Category", how does the symbol table help compress this field? 

You will then have 1M total pointers in the Fact (Data) table and 100 distinct pointers in the symbol table for "Category". I guess it takes less storage to store in this case a pointer of size 7, than the actual field values for "Category"?