Speeding up Aggr by reusing cache - Qlik Community

nafi_qlikview · ‎2015-03-05

Hi,

I've spent a lot of time on this already, without any success. Maybe you guys can help.

In my data model, I have a list of quote requests and corresponding quotes for each request:

QUOTEREQUESTS:

QUOTEREQUESTID, CREATED

QUOTES:

QUOTEREQUESTID, RANK, PRODUCTID, PRICE, ...

I need to do a number of calculations on the best quote per quote request. E.g. average PRICE of the best quote.

Depending on who uses the dashboard, the view on products is restricted so that I can't just filter on RANK = 1 or pre-aggregate, but instead have to dynamically aggregate based on min RANK.

In order to calculate the average PRICE for the best quote, as mentioned in my example, I use something like:

Avg(Aggr(FirstSortedValue(PRICE, RANK), QUOTEREQUESTID))

Due to the data volume in my application this is quite slow for this one expression already. Since I have to calculate a number of other metrics based on other fields than PRICE, calculation times for each expression add up.

Now, I was wondering if anyone has an idea how I could improve this. One idea I had was to try to somehow dynamically cache the information for the best quote (so that it's only aggregated once) and then calcualte everything from the cached results. However, I couldn't work out how to do this since Aggr can always only store one value. Another idea was to store (aggregate) the best PRODUCTID per QUOTEREQUEST and then somehow use set analysis on this, but couldn't work out how to do this either.

Also, this whole Aggr operation seems to be single threaded. Any idea why? Wouldn't logically expect that.

I would appreciate your input!

Martin

jonathandienst · ‎2015-03-13

Hi

I think that you will need to set a flag in the load script for the best quote for a given quote request. Then you won't need to do an aggr(), a simple set expression to find the flag will be much more efficient.

HTH

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

nafi_qlikview · ‎2015-03-13

As mentioned, 'best quote' is dependent on who logs on to the dashboard as the view on products will be restricted plus filters could be applied.

E.g.: Product 1, 2, 3 and 4 exist in the dashboard. Now a user logs on who can only see 2, 3 and 4. He applies a filter to only look at 2 and 4 - Given this data set, the best quote can be different to what it would be if all products were available and hence can't be pre-flagged in the load script.

jonathandienst · ‎2015-03-13

That does not make sense - why would the *best* quote depend on who is looking at it?

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

nafi_qlikview · ‎2015-03-13

Apologies if that wasn't clear. Think of it as different companies looking at how their (best) products performed.

So say product 1 belongs company 1 and product 2 and 3 to company 2. When company 2 looks at the dashboard, they want to know how THEIR products performed and what THEIR best product was (even though it may not have ranked first). Otherwise I could simply filter on RANK=1.

Hope that clarifies it.

jonathandienst · ‎2015-03-13

Hard to answer as your requirement is more complex than appeared in the original post.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

marcus_sommer · ‎2015-03-13

Maybe you could speed up the calculations if you applied some filter inside the expression like:

Avg(Aggr(FirstSortedValue({< Month = {">=$(max(Month) - 2)"}>} PRICE, RANK), QUOTEREQUESTID))

or similar to limit the amount of data. Also conditions on the calculation could be helpful to force user to select certain datasets.

In other way could be to find another approach to display these data, maybe with the tab restrictions or to create a calculated dimension so that you could avoid the aggr-calculation over each row.

- Marcus

nafi_qlikview · ‎2015-03-13

The date period filter is already in there, just left it out here for simplicity. This indeed improves performance noticably, however the dashboard includes an option to see all data for rolling 24 months which then takes quite a while to calculate.

The aggregated result is currently displayed in a text box as a single value and I don't intend to change this. I will think about your calculated dimension suggestion. I have a feeling that this may be the right direction, although it's basically what I'm doing with the Aggr. Maybe I can design it in a way so that it stores QUOTEREQUESTID and PRODUCTID which would solve my problem. Any ideas welcome!

Martin

marcus_sommer · ‎2015-03-13

Maybe a different structure in your datamodel could improve the calculations within the gui. Often are gui-calculations over a single (fact)-table faster then if there some hopping between several tables necessary. That meant, try to put all fields that you need for these calculation into one single table - maybe a extra one.

- Marcus

nafi_qlikview · ‎2015-03-13

Ok, I've tried moving the calendar lookup one hop closer to the main fact (only 1 hop total) as well as including it in the main fact (which doubled the size of my app as the main fact has 100s of millions of records). Unfortunately none of this made a noticeable difference, probably because all joins it does are already fully indexed and on unique keys in the original data model.

So back to the calculated dimension idea. If I could somehow store QUOTEREQUESTID and PRODUCTID for the best quote and then reuse the result of this calculation as a dimension for all my metrics, it could reuse the cache and should be much faster. Any ideas how this could be achieved?