Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
nicolas_martin
Partner - Creator II
Partner - Creator II

Vectorial storage: how does it work?

Hello,

I have a table with about 50'000'000 prices.

This tables contains the following columns:

- id (a unique ID)

- product_id

- country_id

- price_date

- price_excluding_VAT

This table weight about 500 Mo in a QVD.

I have a table with the possible VAT by day and by country:

- country_id

- VAT_date

- VAT (in %)

This table is about 1 Mo.

If I make a load in 2 tables, I have a $Syn key, and it's harder for me to do my expressions, but my QVW file is about 500 Mo.

If I make a LEFT JOIN between the 2 tables using the columns "country_id" and "VAT_date / price_date", this adds a field "VAT" in my price table.

I thought that it would not weigh down my QVW file, because it's often the same VAT from a day to another, and QV uses a vectorial storage.

Unfortunatly, my QVW file is now 800 Mo!

(and my join is correct, because I don't have any duplicated "id" after my join)

I think I missed something in the "vectorial storage" behavior.

Can anybody explain to me why I have such a big file?

1 Solution

Accepted Solutions
hic
Former Employee
Former Employee

Do you use compression when saving the file? (Doc Properties - General - Save Format) If so, is there still a difference if you turn compression off?

HIC

View solution in original post

10 Replies
vgutkovsky
Master II
Master II

Silly question, but you renamed VAT_date to price_date before you joined, right?

Vlad

nicolas_martin
Partner - Creator II
Partner - Creator II
Author

Of course I did.

That's why I don't have any duplicated "id" after my join.

vgutkovsky
Master II
Master II

Are there any UI differences between the times that you saved? I recently found something quite shocking in QlikView: the file size will fluctuate depending on the filter selection state at the time of the save. I couldn't believe it at first, but it's definitely true.

Vlad

nicolas_martin
Partner - Creator II
Partner - Creator II
Author

Interesting fact...

In my case, I always clear all selections before saving my application.

vgutkovsky
Master II
Master II

Sounds like something Henric Cronström might be interested in (if he has time, of course)...he understands the inner workings of QlikView storage better than anyone.

Vlad

nicolas_martin
Partner - Creator II
Partner - Creator II
Author

Hmmm... Things gets weirder...

I have only 1 table:

t_price:

LOAD

    rowno() AS %id,

    price,

    date,

    many_other_fields

FROM ...;

This gives me 650'000 lignes, for a 23 MB .QVW file.

If I try to include a VAT by doing:

t_price:

LOAD

    rowno() AS %id,

    price,

    date,

    many_other_fields

FROM ...;

LEFT JOIN (t_price)

LOAD

     DISTINCT %id,

     20 AS vat

RESIDENT t_price;

(this is supposed to add 1 field "vat", that always contains the value "20")

This gives me 650'000 lignes, for a 54 MB .QVW file !!!

Whereas if I try to include the VAT by doing:

t_price:

LOAD

    rowno() AS %id,

    price,

    date,

    many_other_fields

FROM ...;

t_price2:

NoConcatenate

LOAD *,

  20 AS vat

RESIDENT t_price;

DROP TABLE t_price;

RENAME TABLE t_price2 TO t_price;

I still have the same 650'000 lines, I still have 1 more field "vat" that still has always the value "20", but my QVW file is only 26 MB.

I don't understand why adding a field by a JOIN is heavier than adding it by a LOAD, whereas the data are the same at the end...

I tried to do both "JOIN" then "LOAD" just in case the "LOAD" had a "cleaning power". It don't. My file is still 54 MB.

hic
Former Employee
Former Employee

@ Nicolas MARTIN

I don't know what you mean with "Vectorial storage"... QlikView stores each table in several tables: One data table with pointers, and several symbol tables - one for each field. See more on Symbol Tables and Bit-Stuffed Pointers.

Why the file increases in your case is impossible to say without an analysis of the memory usage - something which is easy to do. See more on Recipe for a Memory Statistics analysis. Then you can check whether it is the data table that is responsible for the entire increase or it has something to do with the symbol tables.

My guess is that the data table is responsible for most of the memory usage, but I am still surprised that the file increases that much.

HIC

nicolas_martin
Partner - Creator II
Partner - Creator II
Author

> I don't know what you mean with "Vectorial storage"

In french, QlikView talks about "stockage vectoriel", that means if there is many occurences of the same value for the same field, it is stored only 1 time and there is a pointer on the value (exactly what you explain in your --very interesting-- article).

.

Here is my analysis:

I have a source file "source.qvd" that is 87 MB.

Scenario 1

With the following script:

t_price:

load * FROM source.qvd (qvd);

UNQUALIFY *;

LEFT JOIN (t_price)

LOAD

     DISTINCT %id,

     20 AS vat

RESIDENT t_price;

STORE t_price INTO DUMP_JOIN.qvd (qvd);

STORE t_price INTO DUMP_JOIN.csv (txt);

I have:

- my QVW file is 53 MB

- DUMP_JOIN.csv is 3'580'036 KB

- DUMP_JOIN.qvd is 87'229 KB

and the memory analysis is

2014-07-03_094546.png

Scenario 2

With the following script:

t_price:

load * FROM source.qvd (qvd);

UNQUALIFY *;

RENAME TABLE t_price TO t_price2;

t_price:

NoConcatenate LOAD

*,

20 AS vat

RESIDENT t_price2;

DROP TABLE t_price2;

STORE t_price INTO DUMP_LOAD.qvd (qvd);

STORE t_price INTO DUMP_LOAD.csv (txt);

I have:

- my QVW file is 27 MB (lighter than before)

- DUMP_LOAD.csv is 3'580'036 KB (exactly the same size as before)

- DUMP_LOAD.qvd is 87'229 KB (exactly the same size as before)

and the memory analysis is

2014-07-03_095048.png

(exactly the same size as before !)

I conclude that the data are the same, but the storage inside QlikView application is different. And I don't understand why.

Just for fun, I tried the following:

t_price:

load * FROM source.qvd (qvd);

UNQUALIFY *;

LEFT JOIN (t_price)

LOAD

     DISTINCT %id,

     20 AS vat

RESIDENT t_price;

STORE t_price INTO TMP.csv (txt);

DROP TABLE t_price;

t_price:

LOAD *

FROM

TMP.csv

(txt, utf8, embedded labels, delimiter is ',', msq);

And my QVW file is... 33 MB.

hic
Former Employee
Former Employee

Do you use compression when saving the file? (Doc Properties - General - Save Format) If so, is there still a difference if you turn compression off?

HIC