Very poor Aggr() performance

antoine816 · ‎2021-02-09

Hi everyone,

I'm having performance problems using the Aggr() function to answer a specific use case.

Please see attached my data model that contains:

a FACTS table that is essentially order lines (so multiple rows per customer), total of 17 million rows
a CALENDAR table that only has 36 rows (for 36 months) and that is linked on the month to the FACTS
a PERIOD table that maps each month to a period starting on Jan of the same year (I want to easily display Year To Month = from the first month of the year to the current selected month)

It gets tricky when I want to calculate calculations such as Average Basket (sales / transactions) or Items Per Transaction (quantity of items / transactions) as they need to be calculated on a customer level, to avoid high-paying or very frequent customers to make those 2 measures increase.

What I'm doing is I'm using an Aggr() to first calculate the 2 measures by customer, and then I run an Avg() over the Aggr() to get the global average.

Something like this:

Avg(
    Aggr(
	 Sum([Sell Out With Taxes Group Currency])
         /
         Count(Distinct [Order Id])
       , [Contact Id]
    )
)

So far so good. The formula works fine and returns the expected results. But the problem is in the loading time:

it's acceptable (3-4 seconds) when there is a filter on the YearMonth (field directly inside the table)
it's a nightmare (20-25 seconds) when there is a filter on the Period (field from a linked dimension table, and that also increases the dataset size)

The weird thing is that when I create a table (in the front end) to display what the Aggr() is supposed to do, the table loads and displays all the results in 4-5 seconds (vs 20-25 seconds for the Aggr()).

Any idea how I can speed this up (without remodelling)?

Any idea how the Aggr() works and why it's so much longer to compute compared to a classic table?

Thx a lot!

Antoine

QFabian · ‎2021-02-09

Hi @antoine81 , as you mentioned, there is a lot of rows to process, so, maybe you have to think about create that result set into a new table in your data model.

But please try this expression, maybe works :

Avg(
Aggr(Sum([Sell Out With Taxes Group Currency]), [Contact Id])
/
Aggr(Count(Distinct [Order Id]), [Contact Id])
)

QFabian

marcus_sommer · ‎2021-02-10

Qlik needs to create a virtual table for each object which provides the dimensional context on which the expressions are performed. This step is single-threaded and usually the heaviest part within a calculation while nearly all "normal" aggregations are multi-threaded.

An aggr() is the same because it creates at first such a virtual table and performed then the expression. If aggr-constructs are used within expressions Qlik mustn't be created a single virtual table else as much as aggr() exists - and within a following step those tables needs to be matched with each other because the results need a relation. This means aggr() will be always very expensive measures and should be avoided if any possible.

In your case you couldn't avoid the use of an aggr() but you could adjust your datamodel or using a different approach to get YTM results. Adjusting seems rather easy and means here not to connect the period table on the calendar else connecting it directly to the fact-table. How much will it improve the performance - a bit probably not enough for you because the period table remains an as-of-table.

I do like the idea of as-of-tables and the easiness to implement such logic and to use it within the UI but from a performance point of view it's not the most performant way to get the needed views. Alternatives could be to create appropriate flags - especially if the views base mainly on today() or month(today()) which are then used as multiplicator and/or within a set analysis condition. Should the views be completely flexible in regard to the selections it needs a bit more efforts, for example with creating a continuous month-counter field - maybe created with autonumber(year & month) - and then querying the max. value of it within the set analysis and calculating appropriate offset-values with it.

Beside this you should also keep your objects as small as possible - means not using such an aggr() within a pivot with a lot of dimensions else using a table-chart with just a few dimensions. Of course you may now need multiple objects to see all wanted data but usually that's not a disadvantage else a benefit because the smaller objects are much better readable as large tables in which you must scroll a lot and/or permanent track that you are looking in the right rows and columns.

- Marcus

aggregate

function

Performance

speed