7 Replies Latest reply: Aug 3, 2015 1:45 PM by Sandro Krumbein RSS

    Influencing factors for performance of preceding load

    Sandro Krumbein

      Hello!

       

      For posts in my blog I did experiments on how using a preceding load influences the performance of a load command and which factors contribute to the performance difference. In my last post I ran into results, that I can't explain.

       

      The summary:

      When using a command with a preceding load layer to copy values from one table to another, there seems to be no dependence on the number of distinct values (which is what I expected as the values are represented by more bits in the pointers in the symbol table), but there is a dependence on the type of data in the copied field (which I didn't expect).

       

      Can anyone help to shed light on why this might be the case?

       

      You can find the post with more detail here:

      Preceding loads, Datentypen und distinkte Werte - ks-qlik.com

       

      Looking forward to your ideas, I must be missing something :-)

       

      Regards,

      Sandro

        • Re: Influencing factors for performance of preceding load
          Marcus Sommer

          Hi Sandro,

           

          it's interesting what you are doing. I think you should repeat your experiment with more records and more columns maybe 10 M rows and 3 - 10 columns to gain more practically relevance. By only 1 k of records the measuring inaccurany will be relative big and could be distorted by overhead-processes for the loadings.

           

          - Marcus

            • Re: Influencing factors for performance of preceding load
              Sandro Krumbein

              Hello Marcus!

               

              The time is measured like this:

               

              Call StartTimer();

              for i = 0 to 99

                   NoConcatenate

                   Table$(i):

                   load Field1;

                   load Field1 resident Table;

              next

              Call StopTimer();

               

              The time difference is then broken down to 1 execution of the command. And then the 100 copies are dropped and all this is repeated for another 99 times, randomly choosing any of the variations. So I can't really see any overhead from loading, yet.

               

              But I won't resist your input. I'll run it with more rows and columns and let you know  :-)

                • Re: Influencing factors for performance of preceding load
                  Marcus Sommer

                  With overhead I meant that qv needs to evaluate each load-statement and check exists the field and table already or needs them to create new, is logging or automatically saving enabled and probably some thing more. By very small datasets this overhead could be longer as the load itself.

                   

                  Maybe this discussion is also helpful: Re: Re: String Key vs Integer Key -- any difference in link performance?.

                   

                  - Marcus

                    • Re: Influencing factors for performance of preceding load
                      Sandro Krumbein

                      Okay, I see your point. There must be some preparation going on before the command is actually executed and that might way in heavily with very small data sets. I have now gone up to 1.000.000 rows with 100.000 distinct values each and 3 columns. I measured the time for one copy action with preceding load. On average that took 4,34 seconds, I guess that's big enough relative to whatever overhead there might be.

                       

                      The results in terms of processing time for different data types are the same. Float and Integer taking 290 % more time than the Text one.

                       

                      The article you mentioned states, that the connection for calculating formulas in the UI is made based on the pointers. That in my opinion is only a further indication, that internally QlikView works with the pointers all the time, also when copying from one table to another like in my example.

                       

                      Which suggests (according to my current understanding) that the pointers for the Text field is actually smaller than the Float and Integer ones. I had a look at the QVDs for 1000 rows and 100 distinct values for each data type:

                       

                      Float: BitWidth = 8 Bit, Length = 90

                      Integer: BitWidth = 8 Bit, Length = 50

                      Text: BitWidth = 8 Bit, Length = 70

                       

                      I am assuming (please correct me, if I am wrong) that the BitWidth is the size of the pointer? At 200 distinct values it still is at 8 Bit, from 300 distinct values it is at 14 Bit. Either way, this doesn't seem to be the reason for what I am seeing.

                       

                      Any more ideas?

                       

                      Regards,

                      Sandro