Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
Which function will reduce the number of bytes of a complex field.
Eg,Autonumber(Field1 & '##' & Field2) as Reduced
or
Autonumberhash128(Field1 & '##' & Field2) as Reduced
THanks
Ren
HI,
Autonumberhash128(Field1,'##',Field2)
it will reduce the number of bytes.
I believe the final symbol size should be identical, minimal in both cases
Second function only calculates an intermediate temporary hash value from the argument.
I personally would go for Autonumber().
Thanks for the reply would like to keep it open as of now.
Sure.
For these kind of questions, it's also easy to create a test script and actually see how it performs on your real data.
AFAIK, if you create sequential integers using autonumber / autonumberhas128, QV is actually not storing a symbol table at all, since it can be derived from the bit pointers itself.
I would prefer Autonumber function.
Hi ,
Check this ,
AutoNumber() vs Auto..Hash() | Qlik Community
HTH,
HirisH
Stefan, this is interesting.
However, I guess I'm not getting your point in its entirety (my bad). And Rob is letting go as soon as he touches AutoNumberHashxxx (see the link in Hirish' post a little further)
The hash variants serve a purpose I think, but indeed it isn't easy to determine what purpose exactly.
IMHO as long as a script is reading field text values (which the OP will get when inserting ## into the key values) to autonumber/autonumberhash, there must be a symbol table somewhere. Otherwise you'll number identical strings twice. The advantage of the hash functions is that the symbol table will store comparison values of predetermined size (either 16 bytes or 64 bytes), while an autonumber table has to store every string but doesn't know beforehand what the maximum text string size will be.
Since a hash function tries to calculate a unique value for every string imaginable, I think in the long run (high cardinality and large strings) the hash functions will be more RAM efficient than just autonumber.
Does the autonumber symbol table survive script execution? I shouldn't because the only value QV needs in order to do whatever it needs to do after a reload is the autonumber value itself. However, a reload has its own memory requirements and these must be taken into account as well.
I don't really know.for sure. I think Renjith is right in keeping the discussion open for a while longer.
Peter,
Interesting point on the RAM requirements of both during reload. It would be interesting to test. I wouldn't be surprised if AutoNumber() was hashing it's temp lookup table, and therefore using the same RAM as AutoNumberHash*() functions anyways. Let us know if you get a chance to test.
-Rob
Peter,
yes, there need to be a symbol table / lookup table / hash table during the LOAD, I was talking about the resulting data model.
The bit stuffed pointers are just the index of the values' symbol table positions, right?
If you have sequential integer numbers 1 to 10 as field values, like generated by an AutonumberXXX() function call, the index pointers will be 1 to 10, too, so each value's pointer is equal to the value.
Hence, there is no need to store the symbol table at all (and that is AFAIK what QV does, deriving the symbols directly from the pointer index in the record table). Again, 'storing' means surviving the script execution.
AutonumberHash*() and Autonumber() function should basically result in the same, but your point is valid, the way to achieve it might be different in terms of memory consumption / CPU cycles needed during the script execution.
I think the only persons that can really answer this question might be the developers at Qlik, maybe this would be a topic for a Qlik design blog post by hic.