I've noticed that in some cases, the data order in a fact table can have significant impact on chart calc times. This comes into play when the primary dimension of the chart has a lot of values and the order of values in the table is highly disorganized. The attached qvw is an example (somewhat exaggerated). When the 500k values of OrderId are sorted, the pivot table calculates in ~6,000 milliseconds. When a Join is added -- which causes the OrderId values to becomes disorganized -- the calc time jumps to ~15,000 milliseconds.
Has anyone else noticed this effect or has any explanation to add?
Attached is a paper detailing the testing I did and all the sample files if you want to investigate or confirm for yourself.
I never tested or even noticed it by the calc-times of a chart. But I noticed a similar behaviour within the script if a group by load is applied on a sorted table or on a not sorted table, like here described: Optimize Group By Performance.
I think it's technically quite the same and therefore I assume that the cause for the differences in the calc/run-times is also the same. As the reason I assume that Qlik needs just of few hops respectively processing-steps more by a disordered field as if it could read/process it in one continuous stream.
If you just have sum(aggr(sum(LineTotal), OrderId, Product)) as the expression and X as the dimension for the pivot, then the calc times are reasonably close. This makes me think the order of the data is helping the pivot table draw the dimensions faster rather than helping with the actual calculations. It sort of makes sense since your order details pivot is 5M rows, the same size as the OrderDetail table and it helps when the data is already sorted in that order.
That said, I have seen calc times be significantly impacted by the order of fields in a table as well, which is more puzzling than the order of the records.