Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Karl Pover makes the assertion in his blog Is AutoNumber() really worth the hassle? | Harvesting Wisdom that links between tables are done using the record pointer and therefore there is no difference in link structure between a string or an autonumbered key. Can anyone confirm this or provide a different opinion?
I understand the RAM savings from autonumber(). I'm asking specifically about differences in link structure and possibly performance.
-Rob
Whether you have links that are strings, or links that are integers, should make no difference whatsoever in response time. In both cases, the evaluation (when you click) are made through the binary bit-stuffed pointers.
There are however two differences:
@ AUNEZ FABRICE If you have different cardinality in two tables, the symbol table will be created from the union of values in the two tables, and this (new) number of values will determine the length of the bit-stuffed pointers for all tables where this key is found. In other words, the symbol table is re-calculated when a field is encountered a second time. And so are the bit-stuffed pointers of the previous tables.
HIC
Hi Rob, Maybe we can get some input from Henric Cronström since my understanding of this comes from a discussion I had with him 2 years ago in Qonnections.
And while we wait for an official written answer and to keep the discussion active until we get the answer, I've been thinking how I could test a link's performance in a controlled random data model. Removing the impact that less RAM usage has on performance to confirm whether integers have a better link performance than strings seems tricky.
I've created 2 QVWs based on the following 2 scripts that try to even out the RAM size by keeping "dummy" fields.
StringKey.qvw
Load *, left(%String,1)&'X' as Dim;
Load
floor(rand()*26)+65 as Number1,
chr(floor(rand()*26)+65) & chr(floor(rand()*26)+65) as %String
AutoGenerate 50000000;
Load *, Number2 as Expression;
Load
chr(floor(rand()*26)+65) & chr(floor(rand()*26)+65) as %String,
floor(rand()*26)+65 as Number2
AutoGenerate 50000000;
IntegerKey.qvw
Load *, chr(%Number)&'X' as Dim;
Load
chr(floor(rand()*26)+65) & chr(floor(rand()*26)+65) as String1,
floor(rand()*26)+65 as %Number
AutoGenerate 50000000;
Load *, chr(Expression)&'X' as String2;
Load
floor(rand()*26)+65 as %Number,
floor(rand()*26)+65 as Expression
AutoGenerate 50000000;
In both QVWs I've placed all the list boxes and one chart with Dim as the dimension and sum(Expression) as the expression.
The StringKey.qvw is slightly larger in RAM at about 1.09 GB while NumberKey.qvw weighs in at about 1.04 GB, but further manipulation of the data models might case more harm.
After opening and closing the QVWs repeatedly and measuring the calc time of the graph, the faster chart seems to depend more on which QVW was opened first than whether the QVW has a string key or an integer key.
Calctime StringKey.qvw
2719 (1st open)
2750 (1st open)
2859
2781
2703 (1st open)
Calctime IntegerKey.qvw
2750
2781
2766 (1st open)
2718 (1st open)
2750
Where could I have gone wrong?
First, I'm assuming floor() returns a integer because QlikView Help ambiguously says it returns a number.
Second, the models aren't igual and I could have unintentionaly manipulated them in favor of StringKey.qvw
Third, I'm assuming a graph's calc time tells us how fast tables are linked although there can be other variables like the number of unique values in the dimension that determine its calc time.
Fourth, data model has too few tables to show the difference between integer keys and string keys.
Fifth, string values may be to short to show realistic results.
Well, it's a start...
Rob,
My understanding is that integer keys are faster than string keys to perform the link. That is the case in SQL and I believe that it is also the case in QlikView. I do not think that QV uses the pointers to perform the link.
The size of each pointer depends on the cardinality of the dimension. When are these pointers made ? When reading or at the end ?
When loading data ?
If they are built when loading the data, we can face a case when we have different cardinality and orders. if we have 129 different members in the dimension table (8 bytes) and only 120 in the fact table (7 bytes), I assume that QV will allocate the size of each pointer at the end of the reading of the table. If the cardinality is not the same between 2 tables, the pointers will not be the same.
The order of the values may also be different : in that case, the pointers will also be different.
At the end of the load.?
QV may also decide to group the different tables, the different fields and perform the compression at that moment : compute the cardinality, allocate the size for each pointer, attribute a unique pointer for each value across all tables. It will be less efficient at a disk storage point of view, but could be more efficient fr the links.
Henric Cronström writes that "During the QlikView script run, after each load statement, QlikView transforms the data loaded into two table types: one data table and several symbol tables. QlikView creates one symbol table per field. Source: http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers
So, it it why I think that integer keys are faster to perform the links because they are used to do it. As in any comparison, integer will be faster than string.
Fabrice
Whether you have links that are strings, or links that are integers, should make no difference whatsoever in response time. In both cases, the evaluation (when you click) are made through the binary bit-stuffed pointers.
There are however two differences:
@ AUNEZ FABRICE If you have different cardinality in two tables, the symbol table will be created from the union of values in the two tables, and this (new) number of values will determine the length of the bit-stuffed pointers for all tables where this key is found. In other words, the symbol table is re-calculated when a field is encountered a second time. And so are the bit-stuffed pointers of the previous tables.
HIC
Henric,
Thanks a lot for explanation. I clearly did not understand your post. When reading it again, it is quite clear that QV computes the pointers for each field, and not as I believed, for each field within each table. Therefore, the cardinality is computed for all values whatever the number of tables they can be in. And one value has the same pointer for all tables: the link can be made on it, ensuring performance.
Many "Best Practices" that I have read and the document on LOAD that I have written contain this false assertion about integer keys to join table. See page 35 of this one (I do not know if it is an official one).
You have broken another myth !
Fabrice
Hello @rwunderlich and @hic
This topic is quite old. Since, there is the second version of QIX since december 2015. Is it still true? I have more than doubt t be honest.
Best regards,
Simon
Here the article https://community.qlik.com/t5/Design/QlikView-12-please-stand-up/bc-p/1475959
It is still true.
The evaluation (when you click) is made through the binary bit-stuffed indices. So, whether the data are strings or integers makes no difference whatsoever in response time.