Optimizing sums with fields from two tables
Robert Ögren May 25, 2015 9:15 AMGreetings everyone,
I am looking for ways to optimize a sum expression that uses fields from two tables.
As an example, take the expression Sum( NoOfUnits * UnitCost ), where NoOfUnits is in one fact table, and UnitCost is in another. As explained in this article,
https://community.qlik.com/blogs/qlikviewdesignblog/2013/08/20/thecalculationengine
QlikView will generate all applicable combinations of the field values before summing them.
The problem in my case is that both tables can be very large, even though the field that links them such as ProductID, does not have that many distinct values. Generating all combinations will result in a very large amount of combinations to summarize.
Take for example this scenario: The first fact table contains order lines, with a number of dimensions such as product ID, customer ID, sales channel, and the number of units sold for that order line. The number of order lines can be many millions, even though the number of distinct products is only a few hundred.
The second fact table contains the unit cost for each product, broken down into many components that form the product, and other dimensions such as component supplier.
It would be possible to combine these tables in the load script, but that would create one extremely large fact table, with many billions of rows. The idea with two fact tables is that these represent a more normalized view of the data. In the example, the breakdown of product cost onto components is the same regardless of which customer buys that specific product, but it will be different for different products.
An example of a simple analysis is that the user selects a set of order lines, for example by choosing a customer, a customer group, or a sales channel. Based on the selected order lines, we want to display the component cost per component.
Both intuitively and mathematically, it should yield the same result, but in a more efficient way, if we first summarize NoOfUnits per product for the selected order lines, and then combine that with the fact table that contains unit cost per product and other dimensions, instead of taking each separate order line and combine with the fact table with unit cost. This has to be done after the selection has been made, since we would like the user to have full flexibility in how they choose the order lines.
For the most simple analysis, which just shows the total amount per product for the selection, it seems to work well to replace the slower Sum( NoOfUnits * UnitCost ) with the following expression:
Sum (aggr(sum(NoOfUnits)*sum(UnitCost), Product))
It summarizes the number of units per product and the unit cost per product, and then multiplies them.
However, I have been unable to create an expression that always works if I want to include dimensions from one or both of the fact tables in the analysis.
I'll try to attach a sample document that uses randomly generated data that can be used to show the problem. On my machine, when selecting one product group, it takes a few seconds to load the "Amount per product slow" analysis, while "Amount per product fast", which uses the more complicated expression with aggr above, loads more or less instantly. Below that is an "Amount per component" which I have not been able to get faster. The load script can easily be changed to see the effects of larger or smaller number of rows in the tables.
Here are some workarounds that I already know of, if it simply is not possible to create a more efficient expression:
 Creating several documents for different kinds of analyses, where the fact tables are summarized to only contain the dimensions necessary for that particular analysis. For example, one analysis where the user must select a customer group. We can summarize the order lines per product and customer group, which will reduce the amount of data. This limits the flexibility of the user though.
 Increase the computation resources (more/faster CPU cores, more memory, adjusting working set settings and so on). The computations are not performed more efficiently, but they can be done faster with more expensive hardware.
 Precache common selections after reloading the document.
Thanks in advance for any suggestions you may have,
Robert

QvOptSumEmpty.qvw 176.0 K