I read it that autonumber stores the expression value and gives it a unique integer value whereas autonumberhash128 stores just the hash value (in 128 bits) of the corresponding expression value. Therefore, autonumberhash128 should be more efficient in data storage (particularily when the expression value is larger) and so the document size reduced.
Willing to be proven wrong though!
Thank you for your reply! Your reasoning seems logical to me. If I find the time, I will do a test to prove you are right.
In addition to my first question, I'm now doubting wether my unique 'semantical key', which I use as input for the autonumberhash128 function, always generates a unique hash and corresponding autonumber. According to Wikipedia (http://en.wikipedia.org/wiki/Hash_function), most hash algorithms cannot guarantee unique hashes for unique inputs. I don't read anaything about it in the QlikView documentation, but I wonder if autohashnumber128 creates unique autonumbers for unique inputs. If not, I don't see a use for this function.
Can you (or anyone else) clarify on this one?
Hi to all,
I am also very interested in a detailled (technical) description on autonumber, autonumberhash128 and the differences regarding:
- memory usage
Would be nice if anyone from the QlikView-Team could post some additional information about these methods here ...
ddoord wrote:I wonder if autohashnumber128 creates unique autonumbers for unique inputs. If not, I don't see a use for this function.
While hash functions don't usually guarantee unique results, there are lots of ways for hash tables to handle collisions. I'd be shocked if QlikView isn't using something robust. Hashing can be pretty basic.
Speaking VERY generally and with no testing and no knowledge of their internal implementation, I'm GUESSING that the advantage of hashing over autonumber isn't in space utilization (it's a 16 byte result, after all), but rather in load speed. Let's say you have a million keys in your table. In an autonumber table, these keys are numbered 1, 2, 3... 1000000. Big loads would never finish if they were just linearly searching this table every time they come across a key during the load (O(n^2) performance), so QlikView is probably using some sort of self-balancing tree with O(n log n) performance. A hash table, on the other hand, will have O(n) performance in the typical case, which is going to be faster on large data sets, and quite possibly even on small data sets due to the simplicity of hashing compared to maintaining a self-balancing tree.
I imagine the load speed difference would be negligible in most cases, though.
Perhaps I should do some testing.
I wonder if autohashnumber128 creates unique autonumbers for unique inputs
This is technically impossible to guarantee. No hash function can guarantee to return unique values for unique inputs unless it is at least of the same size as the input itself. In which case there is no use of such a hash function.
But, collisions is a rare thing for a hash function.