10 Replies Latest reply: Jul 3, 2014 5:56 AM by Nicolas MARTIN RSS

    Vectorial storage: how does it work?

    Nicolas MARTIN

      Hello,

       

      I have a table with about 50'000'000 prices.

      This tables contains the following columns:

      - id (a unique ID)

      - product_id

      - country_id

      - price_date

      - price_excluding_VAT

      This table weight about 500 Mo in a QVD.

       

       

      I have a table with the possible VAT by day and by country:

      - country_id

      - VAT_date

      - VAT (in %)

      This table is about 1 Mo.

       

       

      If I make a load in 2 tables, I have a $Syn key, and it's harder for me to do my expressions, but my QVW file is about 500 Mo.

      If I make a LEFT JOIN between the 2 tables using the columns "country_id" and "VAT_date / price_date", this adds a field "VAT" in my price table.

      I thought that it would not weigh down my QVW file, because it's often the same VAT from a day to another, and QV uses a vectorial storage.

       

      Unfortunatly, my QVW file is now 800 Mo!

      (and my join is correct, because I don't have any duplicated "id" after my join)

       

      I think I missed something in the "vectorial storage" behavior.

       

      Can anybody explain to me why I have such a big file?

        • Re: Vectorial storage: how does it work?
          Vlad Gutkovsky

          Silly question, but you renamed VAT_date to price_date before you joined, right?

           

          Vlad

            • Re: Vectorial storage: how does it work?
              Nicolas MARTIN

              Of course I did.

              That's why I don't have any duplicated "id" after my join.

                • Re: Vectorial storage: how does it work?
                  Vlad Gutkovsky

                  Are there any UI differences between the times that you saved? I recently found something quite shocking in QlikView: the file size will fluctuate depending on the filter selection state at the time of the save. I couldn't believe it at first, but it's definitely true.

                   

                  Vlad

                    • Re: Vectorial storage: how does it work?
                      Nicolas MARTIN

                      Interesting fact...

                       

                      In my case, I always clear all selections before saving my application.

                        • Re: Vectorial storage: how does it work?
                          Vlad Gutkovsky

                          Sounds like something hic might be interested in (if he has time, of course)...he understands the inner workings of QlikView storage better than anyone.

                           

                          Vlad

                            • Re: Vectorial storage: how does it work?
                              Henric Cronström

                              @ nmartin

                              I don't know what you mean with "Vectorial storage"... QlikView stores each table in several tables: One data table with pointers, and several symbol tables - one for each field. See more on Symbol Tables and Bit-Stuffed Pointers.

                               

                              Why the file increases in your case is impossible to say without an analysis of the memory usage - something which is easy to do. See more on Recipe for a Memory Statistics analysis. Then you can check whether it is the data table that is responsible for the entire increase or it has something to do with the symbol tables.

                               

                              My guess is that the data table is responsible for most of the memory usage, but I am still surprised that the file increases that much.

                               

                              HIC

                                • Re: Vectorial storage: how does it work?
                                  Nicolas MARTIN

                                  > I don't know what you mean with "Vectorial storage"

                                   

                                  In french, QlikView talks about "stockage vectoriel", that means if there is many occurences of the same value for the same field, it is stored only 1 time and there is a pointer on the value (exactly what you explain in your --very interesting-- article).

                                   

                                  .

                                  Here is my analysis:

                                  I have a source file "source.qvd" that is 87 MB.

                                   

                                   

                                  Scenario 1

                                  With the following script:

                                   

                                  t_price:

                                  load * FROM source.qvd (qvd);

                                   

                                  UNQUALIFY *;

                                  LEFT JOIN (t_price)

                                  LOAD

                                       DISTINCT %id,

                                       20 AS vat

                                  RESIDENT t_price;

                                   

                                  STORE t_price INTO DUMP_JOIN.qvd (qvd);

                                  STORE t_price INTO DUMP_JOIN.csv (txt);

                                   

                                   

                                  I have:

                                  - my QVW file is 53 MB

                                  - DUMP_JOIN.csv is 3'580'036 KB

                                  - DUMP_JOIN.qvd is 87'229 KB

                                  and the memory analysis is

                                  2014-07-03_094546.png

                                   

                                   

                                   

                                   

                                  Scenario 2

                                  With the following script:

                                   

                                  t_price:

                                  load * FROM source.qvd (qvd);

                                   

                                  UNQUALIFY *;

                                  RENAME TABLE t_price TO t_price2;

                                   

                                   

                                  t_price:

                                  NoConcatenate LOAD

                                  *,

                                  20 AS vat

                                  RESIDENT t_price2;

                                   

                                   

                                  DROP TABLE t_price2;

                                   

                                   

                                  STORE t_price INTO DUMP_LOAD.qvd (qvd);

                                  STORE t_price INTO DUMP_LOAD.csv (txt);

                                   

                                   

                                  I have:

                                  - my QVW file is 27 MB (lighter than before)

                                  - DUMP_LOAD.csv is 3'580'036 KB (exactly the same size as before)

                                  - DUMP_LOAD.qvd is 87'229 KB (exactly the same size as before)

                                  and the memory analysis is

                                  2014-07-03_095048.png

                                  (exactly the same size as before !)

                                   

                                  I conclude that the data are the same, but the storage inside QlikView application is different. And I don't understand why.

                                   

                                   

                                   

                                   

                                   

                                  Just for fun, I tried the following:

                                   

                                  t_price:

                                  load * FROM source.qvd (qvd);

                                   

                                  UNQUALIFY *;

                                  LEFT JOIN (t_price)

                                  LOAD

                                       DISTINCT %id,

                                       20 AS vat

                                  RESIDENT t_price;

                                   

                                   

                                  STORE t_price INTO TMP.csv (txt);

                                  DROP TABLE t_price;

                                  t_price:

                                  LOAD *

                                  FROM

                                  TMP.csv

                                  (txt, utf8, embedded labels, delimiter is ',', msq);

                                   

                                  And my QVW file is... 33 MB.

                      • Re: Vectorial storage: how does it work?
                        Nicolas MARTIN

                        Hmmm... Things gets weirder...

                         

                        I have only 1 table:

                         

                        t_price:

                        LOAD

                            rowno() AS %id,

                            price,

                            date,

                            many_other_fields

                        FROM ...;

                         

                         

                        This gives me 650'000 lignes, for a 23 MB .QVW file.

                         

                         

                        If I try to include a VAT by doing:

                         

                         

                        t_price:

                        LOAD

                            rowno() AS %id,

                            price,

                            date,

                            many_other_fields

                        FROM ...;

                         

                        LEFT JOIN (t_price)

                        LOAD

                             DISTINCT %id,

                             20 AS vat

                        RESIDENT t_price;

                         

                        (this is supposed to add 1 field "vat", that always contains the value "20")

                        This gives me 650'000 lignes, for a 54 MB .QVW file !!!

                         

                         

                        Whereas if I try to include the VAT by doing:

                         

                        t_price:

                        LOAD

                            rowno() AS %id,

                            price,

                            date,

                            many_other_fields

                        FROM ...;

                         

                        t_price2:

                        NoConcatenate

                        LOAD *,

                          20 AS vat

                        RESIDENT t_price;

                         

                         

                        DROP TABLE t_price;

                        RENAME TABLE t_price2 TO t_price;

                         

                        I still have the same 650'000 lines, I still have 1 more field "vat" that still has always the value "20", but my QVW file is only 26 MB.

                         

                        I don't understand why adding a field by a JOIN is heavier than adding it by a LOAD, whereas the data are the same at the end...

                         

                         

                         

                         

                        I tried to do both "JOIN" then "LOAD" just in case the "LOAD" had a "cleaning power". It don't. My file is still 54 MB.