P() vs Aggr()

Dolphin · ‎2022-09-27

Hi everyone,

I have two expressions returning the number of countries that include customers with more then 500.000 in sales:

Count({<[Country] = P({1 <[CustomerID] = {"=Sum([Sales]) > 500000"}>} [Country])>} Distinct [Country])
Count({<[Country] = {"=Max({1} Aggr(Sum([Sales]), CustomerID)) > 500000"}>} Distinct [Country])

My Question is, am I missing a case where there can be a difference in results?

If not, how do the two expression compare in performance?

Or the other way around, can you give me an example where the P() function can not be replaced with another SetExpression?

Thanks and kind regards,

Johannes

marcus_sommer · ‎2022-09-27

I think it will depend on the context of data + data-model + used object + selections if your expressions are returning always the expected results - and if both would return the same.

Beside this you may change the expression to something like:

if(Sum([Sales]) > 500000, Count(Distinct [Country]))

Further to measure the performance you may just observe your object. If you don't notice a difference you may neglect the topic - and if there is a difference you will see which one runs better.

- Marcus

Dolphin · ‎2022-09-27

Hi Marcus,

let me specify my case a bit more. Imagine a single table with:

OrderIDs
CustomerID
Sales
Country
Year

Let's also change the expression slightly, to sum the sales of specific countries instead:

Sum({<[Country] = P({1 <[CustomerID] = {"=Sum([Sales]) > 500000"}>} [Country])>} [Sales])

A single order always has less then 500.000 sales, but one customer may have a total sum of more then 500.000 sales (maybe only over multiple years). Now I want to sum the sales for the countries where at least one of those customers is from.

I don't have an example at hand where I would notice any performance difference. What I was looking for is some insights on how P() is working. It's probably calculating a similar thing to what Aggr() does.

If there is an example without a feasible alternative to P() it would be rly helpful to get a deeper understanding.

Thanks & Regards,

Johannes

marcus_sommer · ‎2022-09-27

p() returns the possible values of a field - this means those which were directly selected and green or those which were white because no other selection has excluded them. And within the p() any further set analysis might be specified to overwrite/ignore/adjust the made selections and/or related selections from other fields.

This means it's purely a read and/or specified selection. The entire set analysis is mainly a selection whereby each condition returned just TRUE or FALSE to the underlying field-values. Nesting them won't change the behaviour else just applying more or less complex intersections of parts of the data-set.

To conclude this a set analysis worked on a field- respectively column-level against the stored field-values whereby not only complete field-values must be queried else als0 searches are possible. The same like you used a >= X or a *search-string* on a list-box.

Not possible is instead to make queries against multiple values like aggregation does. At least not within a "classical" set analysis even if it's possible to integrate them within the set analysis syntax and so it remains in reality an if-loop and therefore my suggestion to outsource it from the set analysis. IMO it simplified the logic and the syntax and although I never made a comprehensive benchmark on the various ways to apply such if-loops I doubt that there will be significantly differences (meant in regard to your example).

The above is completely independent to the scope in which an aggregation needs to be done. Means if it's necessary to calculate the aggregation on a different level as the object has and/or to use such results as a filter for other calculations and/or it's needed to aggregate them further for any counts/average you will mostly need an aggr() to be able to specify the dimensional context in which the aggregation should be performed.

- Marcus

General Question