For posts in my blog I did experiments on how using a preceding load influences the performance of a load command and which factors contribute to the performance difference. In my last post I ran into results, that I can't explain.
When using a command with a preceding load layer to copy values from one table to another, there seems to be no dependence on the number of distinct values (which is what I expected as the values are represented by more bits in the pointers in the symbol table), but there is a dependence on the type of data in the copied field (which I didn't expect).
Can anyone help to shed light on why this might be the case?
You can find the post with more detail here:
Looking forward to your ideas, I must be missing something :-)
it's interesting what you are doing. I think you should repeat your experiment with more records and more columns maybe 10 M rows and 3 - 10 columns to gain more practically relevance. By only 1 k of records the measuring inaccurany will be relative big and could be distorted by overhead-processes for the loadings.
The time is measured like this:
for i = 0 to 99
load Field1 resident Table;
The time difference is then broken down to 1 execution of the command. And then the 100 copies are dropped and all this is repeated for another 99 times, randomly choosing any of the variations. So I can't really see any overhead from loading, yet.
But I won't resist your input. I'll run it with more rows and columns and let you know :-)
With overhead I meant that qv needs to evaluate each load-statement and check exists the field and table already or needs them to create new, is logging or automatically saving enabled and probably some thing more. By very small datasets this overhead could be longer as the load itself.
Maybe this discussion is also helpful: Re: Re: String Key vs Integer Key -- any difference in link performance?.
Okay, I see your point. There must be some preparation going on before the command is actually executed and that might way in heavily with very small data sets. I have now gone up to 1.000.000 rows with 100.000 distinct values each and 3 columns. I measured the time for one copy action with preceding load. On average that took 4,34 seconds, I guess that's big enough relative to whatever overhead there might be.
The results in terms of processing time for different data types are the same. Float and Integer taking 290 % more time than the Text one.
The article you mentioned states, that the connection for calculating formulas in the UI is made based on the pointers. That in my opinion is only a further indication, that internally QlikView works with the pointers all the time, also when copying from one table to another like in my example.
Which suggests (according to my current understanding) that the pointers for the Text field is actually smaller than the Float and Integer ones. I had a look at the QVDs for 1000 rows and 100 distinct values for each data type:
Float: BitWidth = 8 Bit, Length = 90
Integer: BitWidth = 8 Bit, Length = 50
Text: BitWidth = 8 Bit, Length = 70
I am assuming (please correct me, if I am wrong) that the BitWidth is the size of the pointer? At 200 distinct values it still is at 8 Bit, from 300 distinct values it is at 14 Bit. Either way, this doesn't seem to be the reason for what I am seeing.
Any more ideas?
It's hard to comment without seeing the actual test qvw. I agree with Marcus that the sample size is probably too small to draw accurate conclusions. Is it possible to post your test qvw?
Sure, here we go.
1. Run "01 Creating the data.qvw" (to create the QVDs needed to run the test)
2. Run "02 Running the tests.qvw" (to load test setups and be able to start a test)
3. Select a test and click "Run test" to actually run it and measure performance data
We are talking about test 5 vs. 7 here.
Hope it works for you, haven't put much effort into having this work out of the box, yet.
Let me know any comments you might have :-)
P.S. QVWs created with Personal Edition