Qlik Community

Ask a Question

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Announcements
Welcome to our newly redesigned Qlik Community! Read our blog to learn about all the new updates: READ BLOG and REPORTED ISSUES

Part of my job as an Enterprise Architect at Qlik is dealing with hardware sizing. There are several factors to take into account to properly size any environment, and one of the most underrated I have found when discussing this subject with IT teams and developers, is the value of distinctness. But this is exactly one of the things that makes QlikView different. Let me explain this by taking a look at an example.

My app needs a calendar from January 1, 2011 to February 25, 2013. This data is stored in the database in the form of a timestamp:

DD/MM/YYYY hh:mm:ss

That makes 99 531 648 possible values. This field alone, when the QVW is opened, uses 2.3 GB of RAM.

That’s quite a bit of RAM for only one field. Let’s do some math. QlikView pointers are bit stuffed pointers meaning that QlikView needs 27 bits to store 99.5 million values. So, to calculate the RAM usage for the symbol tables, you need to use the following formula:

(Field Length in bytes * Number of Values) + Pointer Size (2^27) = Theoretical Total Size of Model

(18 * 99 531 648) + 134 217 728 = 1 925 787 392 Bytes

So for that field alone, I need to allocate 2 GB RAM, of which 134 MB are to store the pointers. It seems to be quite big for this small part of the model.

Since I cannot afford that much for one field alone, I’m going to split that field into two: date and time.

Now I have a Date field of 1 152 possible values and a Time field of 86 400 possible values. These two fields, when the QVW is open uses 24 MB of RAM.

Let’s do the math with these figures, using the same formula for Date and Time:

Date: 13 568 Bytes

Time: 707 576 Bytes

Because I’m always thinking of optimizing the models, I have decided to discard seconds, as they are not going to be used in my app, but I do need hour and minute.

The Date field has 1 152 possible values but the Time field is now reduced to 1 439 possible values. These two fields, when the QVW is open use 3.9 MB of RAM.

Let’s do the math with these figures, using the same formula

Date: 13 568 Bytes

Time: 9 243 Bytes

These file sizes may vary slightly depending on the file system, size of hard disk, amount of RAM in the system, etc.

We have seen some figures handling dates, but use this same technique with full names, addresses, composite fields, phone numbers… This is when the associative magic happens!

As you can see, distinctness is everything but trivial in QlikView, and taking proper care of fields when developing from the start will make the difference between a viable, well performing model and app and a huge, slow app.

Do you have any examples or tips and tricks of the value of distinctness?

Appendix I: Table with file sizes and RAM footprint for each example

qcomDBlogUsageTable.PNG.png

Appendix II: Script code used for each scenario, that you can download here.

24 Comments
Partner
Partner

great great great...!

Wonderfull doc!

Thanks

Magnific!

3,134 Views
Not applicable

Excellent doc.

I have two questions:

1) the field length is 18. Is is just for dates ? for all fields (because there is no data type in QV) ? (I thought it was the number of bytes, here 27 but it seems that it is not the case).

2) why a day is composed of 86 399 seconds and not 86 400 (24*60*60) ? Is there a second that should not be there like 00:00:00 ?

I have learnt a lot, thank you

Fabrice


3,134 Views
Specialist II
Specialist II

It looks like today's job is to trim all of the date/time values from my loads. Thanks!

3,134 Views

Hi Fabrice,

To your first question, it is for this case as returned by the Len() function, because the representation is a literal, not a numeric value.

To your second, it's just a question of scripting. If 00:00:00 is the second 0, then 23:59:59 is the second 86 399. IterNo() will always return 1 as first value, however, for me the first second of each day is 00:00:00. That's why I have to substract 1 in the WHILE part.

If you are familiar with MS programming languages, think of the "Option Base" statement in Visual Basic to set arrays start their indexes in 1 and not in 0 (or viceversa).

You could use a different script using:


SET vMinDate = Date('31/12/2010');

instead of

SET vMinDate = Date('01/01/2011');


Discarding the "-1" in the rest of the script and using 86400 as the number of iterations in the loop.

Hope that makes sense.

Miguel

3,134 Views
Not applicable

Miguel,

Thanks for answering so fast.

1) 18 = length of the field. OK. I did not catch you used the Len() function to find that number. Now it is clear. And logic also.

2) You write "Now I have a Date field of 1 152 possible values and a Time field of 86 399 possible values." I was not speaking about scripting and iterations, but just number of possibilities. I think there are 86 400 possibilities (perhaps from 0 to 86 399) because there are 86 400 seconds in the day, and we count every second. No ? I perhaps did not understand correctly your sentences.

Fabrice

3,134 Views

Hi Fabrice,

You are totally correct, it's a mistake. I'm reviewing it right away.

Thanks!

3,134 Views