Solved: Size issues with MS uniqueidentifier? - Qlik Community

Report Inappropriate Content · ‎2010-12-14

HI

Wondering - my data contains many fields that contain the Microsoft 38 byte OID (generated by newID() function) values.

Does it make sense to try to replace them with shorter QV hash values or will I run into problems when tying to link such fields?

Or is QV anyway using smaller values to optimize on space?

Thanks

Juerg

rwunderlich · ‎2010-12-15


maier_con wrote: Do I really get the same autonumber value back for the same uniqueidentifier?

Yes, you do. As long as all the calls to autonumber() are in the same script run. Consider the following sequence of calls to autonumber()

Param Return
ZZRT 0
ABEDF 1
RESTG 2
ZZRT 0
JKLM 3

The first value passed to autonumber() gets assigned 0, 1 to the next and so on. When a previously "seen" value (ZZRT) is passed, it gets the same value (0) as the first time it was passed. This is a lookup function, not a hash function. Different param values will always give different return values, and the same param value will always give the same return value.

Each unique 34 byte PK will require about 40 bytes of storage. So 100k keys will require about 4MB of symbol space in RAM. That's not much. The autonumbered version would require no symbol space saving you the 4MB. Using autonumber() will make the script run slower, but potentially make the UI run slightly faster.

-Rob

View solution in original post

rwunderlich · ‎2010-12-14

If you don't need to display the actual OID value, but only use them for keys, you can turn them into compact keys using the QV autonumber() function.

If you have many different values for OID, autonumber may save significant space. If you have a lot of rows that have OID, but a small set of distinct OID values, autonumber won't save much space and is probably not worth the trouble.

-Rob

Report Inappropriate Content · ‎2010-12-15

Hi Rob

Thanks for you jumping in, but I still have my doubts about the autonumber function(s).

Assume I have a 100'000 article table and a selling table averaging about 1000 sales per article in my database.

The article table has a uniqueidentifyer as PK (34 bytes), the sales table points to the article by this FK.

Is it save now to say in QV

load autonumber(articleOID) as ArticleID ... from articles

and

load autonumber(articleOID) as ArticleID ... from sales

to get the equivalent of the SQLServer FK constraint and the QV autojoin on the ArticleID? Any possible mismatch will cancel my project.

I can't really see how this is done. Do I really get the same autonumber value back for the same uniqueidentifier?

Thanks

Juerg

rwunderlich · ‎2010-12-15


maier_con wrote: Do I really get the same autonumber value back for the same uniqueidentifier?

Yes, you do. As long as all the calls to autonumber() are in the same script run. Consider the following sequence of calls to autonumber()

Param Return
ZZRT 0
ABEDF 1
RESTG 2
ZZRT 0
JKLM 3

The first value passed to autonumber() gets assigned 0, 1 to the next and so on. When a previously "seen" value (ZZRT) is passed, it gets the same value (0) as the first time it was passed. This is a lookup function, not a hash function. Different param values will always give different return values, and the same param value will always give the same return value.

Each unique 34 byte PK will require about 40 bytes of storage. So 100k keys will require about 4MB of symbol space in RAM. That's not much. The autonumbered version would require no symbol space saving you the 4MB. Using autonumber() will make the script run slower, but potentially make the UI run slightly faster.

-Rob

Report Inappropriate Content · ‎2010-12-17

Hi Rob

As it will use only 4 MB for the key in the article table the sales table will use 4 GB just for the reference?

Juerg

rwunderlich · ‎2011-01-03

Hi Juerg,

My understanding is that both tables will use a reference -- to the 4MB of values, which are stored in a separate symbol table.

Autonumber values are sequential integers. Sequential integers take advantage of a special optimization where the state array (selected on/off) can also represent the values, so no symbol space is required.

If I got this wrong I hope someone with more internals knowledge (like Håkan or Jay) will correct me.

-Rob