Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
hic
Former Employee
Former Employee

 

Today I have a blog post for the Geeks¹. For the hard-core techies who love bits and bytes. The rest of you can stop reading now. For you, there are other interesting posts in the Business Discovery Blog and in this blog, the QlikView Design blog.

 

Now to the bit-stuffed pointers:

 

During the QlikView script run, after each load statement, the Qlik engine transforms the data loaded into two table types: one data table and several symbol tables. The engine creates one symbol table per field:

 

Symbol tables.png

 

The symbol tables contain one row per distinct value of the field. Each row contains a pointer and the value of the field, both the numeric value and the textual component. Basically, the symbol tables are look-up tables for the field values.

 

The data tables are the same tables as you can see in the QlikView internal table viewer (<CTRL>-T) when you have chosen the “Internal table view”  – the same number of rows, the same number of columns. However, the tables do not contain the data itself – they contain the pointers only. But since the pointers can be used to look up the real value in the symbol tables, no information has been lost.

 

Data table.png

 

These pointers are no ordinary pointers. They are bit-stuffed indices, meaning – they only have as many bits that it takes to represent the field, never more. So if a field contains four distinct values, the index is only two bits long, because that is the number of bits it takes to represent four values. Hence, the data table becomes much smaller than it would have been otherwise.

 

The bit-stuffed pointers and the symbol tables are the reasons why the Qlik engine can compress data the way it can.

 

Understanding this will help you optimize your document. It’s obvious that the number of records and number of columns in a table will affect the amount of memory used, but there are also other factors:

 

  • The length of the symbols will affect the size of the symbol table.
  • The number of distinct values in a field will affect the number of rows in the symbol table as well as the length of the pointers.

 

When creating QlikView scripts, always ask yourself if there is any way to reduce these numbers, to minimize the memory usage. Here are a couple of common cases:

 

  • You have a long, concatenated, composite key that you don’t need to display. Use Autonumber() and the symbols will take no space in the symbol table. The integer values will instead be calculated implicitly.
  • You have a field with many unique timestamps. Then you are sometimes better off if you first split it into two fields – Date and Time – and round the Time downwards to closest 15-seconds interval or to nearest full minute, e.g.:
       Date(Floor(Timestamp)) as Date,
       Time(Floor(Frac(Timestamp),1/24/60)) as Time,
    These expressions will give you at most 24*60=1440 distinct time values (11 bits) and typically 365 distinct dates (9 bits). In other words, as soon as you have a timestamp field with more than 1 million (20 bits) distinct values, the pointer for the timestamp field takes more space than the pointers for the two individual fields. And for the number of rows in the symbol table(s) you hit the break-even much sooner. So you should consider splitting it into two fields sooner, maybe when you have around 100k distinct values.

 

If you found this post interesting, I greet you welcome to the QlikGeeks.

 

HIC

 

PS. All of the above is of course true for both QlikView and Qlik Sense. Both use the same engine.

 

If you want to read more about QlikView internals, see also

Logical Inference and Aggregations

Colors, States and State vectors

The Calculation Engine

 

¹ Geeks, see picture:

Geeks.png

71 Comments
hic
Former Employee
Former Employee
1,205 Views
Not applicable

another my fav article

The QlikView Cache

0 Likes
1,205 Views
joydipp1988
Creator
Creator

Dear hic‌,

I find this blog post extremely beautiful and interesting. But still I think I am a nerd.

Whatever, I have a question regarding this topic. Here it is-

1) I would like to know which Data Structure is it following. My idea is Linked List. Please elaborate and correct me if I'm wrong.

2) In your blog you are telling that, "So if a field contains four distinct values, the pointer is only two bits long, because that is the number of bits it takes to represent four values."

I clearly understood how 2 bit long pointer is enough to represent 4 distinct values as 2 bit long pointers has 4 combinations like 00,01,10,11. But I'm not been able to apply the idea for other values.

So if,

No of distinct values under a field = m

No of bits in the Bit-Stuffed pointer = n

Then, is there any relation between m and n or is there any rule to find No of bits in the Bit-Stuffed pointer

from the no of distinct values present ? I'll eagerly wait for your reply.

Thanks in advance,

Joy

0 Likes
1,106 Views
rbecher
MVP
MVP

Hi Joy,

spoken in Qlik, the bitsize (no of bits) is:

n = floor(log10(m)/log10(2))+1

- Ralf

1,106 Views
hic
Former Employee
Former Employee

A slight tweak of Ralf's formula:

floor(log10(m-1)/log10(2))+1 as n

resulting in the following table:

n (bits)m (no of distinct values)
23-4
35-8
49-16
517-32
633-64
765-128
8129-256
9257-512
10513-1024

HIC

1,106 Views
rbecher
MVP
MVP

‌Of course, how could I forget! You missed the first two lines:

1 symbol = 0 bits

2 symbols = 1 bit

Not to mention NULL values. This is another story..

0 Likes
1,106 Views
swuehl
MVP
MVP

Seems like there is a brand new technology involved, representing 1 symbol with zero bits

1,106 Views
rbecher
MVP
MVP

‌Yes, indeed. A patented one. It's the data reduction of QlikView. You do not need a pointer in the record if all rows containing the same symbol in a field. Just have a look into a QVD file. You will learn a lot!

1,106 Views
joydipp1988
Creator
Creator

Hi Ralph, Can you explain with a diagram ?

0 Likes
1,143 Views
swuehl
MVP
MVP

Ah, that's why haven't mentioned NULL, right?

edit:

I think we should  forget the NULLs for now, this will just add confusion for most people.

Thanks for the hint with the QVD files, Ralf.

The problem with looking at the real implementations of QV software is that this will somehow break what Henric tried to explain in his blog post.

For example, I can easily create a field with one symbol that takes up to 8 bits:

  <QvdFieldHeader>

       <FieldName>B1</FieldName>

       <BitOffset>0</BitOffset>

       <BitWidth>8</BitWidth>

       <Bias>0</Bias>

       <NumberFormat>

         <Type>INTEGER</Type>

         <nDec>0</nDec>

         <UseThou>1</UseThou>

         <Fmt>###0</Fmt>

         <Dec>,</Dec>

         <Thou></Thou>

       </NumberFormat>

       <NoOfSymbols>1</NoOfSymbols>

       <Offset>0</Offset>

       <Length>5</Length>

       <Comment></Comment>

       <Tags></Tags>

     </QvdFieldHeader>

But I believe all these optimizations, byte paddings etc. are not really helping in understanding what Henric tried to explain (or, maybe he wants to create a new blog post "the return of the bit-stuffed pointers").

Regards,

Stefan

1,143 Views