Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
renjithpl
Specialist
Specialist

Autonumber or Autonumber#

Hi All,

Which function will reduce the number of bytes of a complex field.

Eg,Autonumber(Field1 & '##' & Field2)  as Reduced

                       or

Autonumberhash128(Field1 & '##' & Field2)  as Reduced

THanks

Ren

11 Replies
maniram23
Creator II
Creator II

HI,

Autonumberhash128(Field1,'##',Field2) 

it will reduce the number of bytes.

swuehl
MVP
MVP

I believe the final symbol size should be identical, minimal in both cases

Second function only calculates an intermediate temporary hash value from the argument.

I personally would go for Autonumber().

renjithpl
Specialist
Specialist
Author

Thanks for the reply would like to keep it open as of now.

swuehl
MVP
MVP

Sure.

For these kind of questions, it's also easy to create a test script and actually see how it performs on your real data.

AFAIK, if you create sequential integers using autonumber / autonumberhas128, QV is actually not storing a symbol table at all, since it can be derived from the bit pointers itself.

Not applicable

I would prefer Autonumber function.

HirisH_V7
Master
Master

Hi ,

Check this ,

AutoNumber() vs Auto..Hash() | Qlik Community

HTH,
HirisH

HirisH
“Aspire to Inspire before we Expire!”
Peter_Cammaert
Partner - Champion III
Partner - Champion III

Stefan, this is interesting.

However, I guess I'm not getting your point in its entirety (my bad). And Rob is letting go as soon as he touches AutoNumberHashxxx (see  the link in Hirish' post a little further)

The hash variants serve a purpose I think, but indeed it isn't easy to determine what purpose exactly.

IMHO as long as a script is reading field text values (which the OP will get when inserting ## into the key values) to autonumber/autonumberhash, there must be a symbol table somewhere. Otherwise you'll number identical strings twice. The advantage of the hash functions is that the symbol table will store comparison values of predetermined size (either 16 bytes or 64 bytes), while an autonumber table has to store every string but doesn't know beforehand what the maximum text string size will be.

Since a hash function tries to calculate a unique value for every string imaginable, I think in the long run (high cardinality and large strings) the hash functions will be more RAM efficient than just autonumber.

Does the autonumber symbol table survive script execution? I shouldn't because the only value QV needs in order to do whatever it needs to do after a reload is the autonumber value itself. However, a reload has its own memory requirements and these must be taken into account as well.

I don't really know.for sure. I think Renjith is right in keeping the discussion open for a while longer.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Peter,

Interesting point on the RAM requirements of both during reload. It would be interesting to test. I wouldn't be surprised if AutoNumber() was hashing it's temp lookup table, and therefore using the same RAM as AutoNumberHash*() functions anyways. Let us know if you get a chance to test.

-Rob

swuehl
MVP
MVP

Peter,

yes, there need to be a symbol table / lookup table / hash table during the LOAD, I was talking about the resulting data model.

The bit stuffed pointers are just the index of the values' symbol table positions, right?

If you have sequential integer numbers 1 to 10 as field values, like generated by an AutonumberXXX() function call, the index pointers will be 1 to 10, too, so each value's pointer is equal to the value.

Hence, there is no need to store the symbol table at all (and that is AFAIK what QV does, deriving the symbols directly from the pointer index in the record table). Again, 'storing' means surviving the script execution.

AutonumberHash*() and Autonumber() function should basically result in the same, but your point is valid, the way to achieve it might be different in terms of memory consumption / CPU cycles needed during the script execution.

I think the only persons that can really answer this question might be the developers at Qlik, maybe this would be a topic for a Qlik design blog post by hic‌.