Solved: Re: Calculated Dimension : Which approach is bette... - Qlik Community

PradeepK · ‎2021-02-02

Can anyone explain which one is better approach and why.. to add conditions for calculated dimensions?

First :

Aggr(
	if(sum([Total Sale Amount]) > 1000 AND sum([Total Sale Amount]) < 8000, 
    Location)
,Location)

Second :

Aggr(
	Only(
    	{< Location = {"= sum([Total Sale Amount]) > 1000 AND sum([Total Sale Amount]) < 8000"} > }
    Location)
,Location)

Above expressions are simple example i have created.

Also I know you can always move conditions to script and create flags.. and so on😄.

Point being.. is condition inside set expression any better than if..else statement ?

marcus_sommer · ‎2021-02-06

I think there is no documentation that explains the internal processing of UI calculations in depth especially not in the combination of building the dimensionally context behind any calculation and in regard of any comparing of set analysis against the classical conditions.

Therefore here my knowledge and experiences and some deductions from them.

The most important part isn't the expression itself else the creation of a virtual table which covered the dimensionally context on which the various object expressions are performed. This action is a purely single-threated measure which must happens before any of the expressions are executed.

How much it cost? It's quite difficult to estimate because it depends mainly on the dataset, means:

overall size
structure - means the kind of the datamodel
how are the relevant fields associated
cardinality of the included fields
data-types - pure numeric values vs. record-level formatted values vs. string-values
maybe some more impacts ...

Nevertheless if I would have to guess the calculation parts independent of the concrete datamodel I would tend to:

80% to build the virtual table
10% to perform the expressions
10% to render the object

Of course that's a very roughly assumption deduced from my preferred approach - which is using a star-scheme datamodel and rather simple expressions with things like: sum({< RealSetAnalysis >} value) and avoiding if-loops especially nested ones as well as aggr-constructs and interrecord-functions.

With a rather bad performing datamodel like the (quite common and popular) link-table in which the (for the respective object) related fields are widely distributed you may increase the calculation efforts of creating the virtual table far over 90% by very simple expressions like: sum(value). On the other side you could slowdown a well build datamodel with a rather small dataset quite heavily by using many (nested) if-loops and/or aggr-constructs. I did it already myself multiple times especially in my early doings with Qlik because set analysis didn't exists at this time and of course everything was done with if-loops - and later as we switched to set analysis within the expression on the same data + datamodel it was unbelievable fast - instead of minutes like we was it accustomed with the usual BI / DB / office tools it took just a few seconds or got the results even instantly.

So what does it mean? It means looking on the object-expressions is less important as building a (to your data and requirements) suitable datamodel. No errors and omissions within the datamodel could be cured with any smart and advanced logics in the UI.

Now to your real question. In general are your both approaches the same because:

... Location = {"= sum([Total Sale Amount]) > 1000 ...

and

... if(sum([Total Sale Amount]) > 1000 ...

are both if-loops and evaluated on a row-level.

It's just implementing an if-loop within a set analysis - means only a different syntax and not like above already hinted a real set analysis. A real set analysis looked like:

... Field = {value} ...

and is nearly the same like a selection which means the state-value is appropriate set to TRUE/FALSE - means flagging the values within the system-tables. Therefore this is a column-level evaluation and all with FALSE marked values aren't included within the virtual table and don't need later to be calculated within the expressions. And therefore is the performance between a column-level and row-level evaluation so huge.

In your case I would try both approaches and check the results against each other whereby like hinted I didn't expect big differences and just from a feeling I tend to believe that here the classical if-loop is faster and also from logic and development efforts easier.

Beside of this I suggest to consider my above made suggestion of using buckets as a dimension for selections (and of course they could be even more dynamically because also the kind of aggregation and/or the aggregated field and/or the aggr-fields and/or any further conditions could be controlled through variables and/or other selections).

- Marcus

View solution in original post

JosephMorales · ‎2021-02-03

Hi @PradeepK

You can use both If and Set Analysis expression, but the Set Analysis has upper hand by performance and ease of usability. When you have many conditions then you have use many if else and logical operators, but in set analysis it is very simple. You can use operators (*, +, -, /), identifiers representing current selection or previous selections ($, $ 1, $ _1) and features like P() and E(), which is not possible with If easily.

Using If() will have performance issues, since we need to evaluate row wise.

You can read this article https://community.qlik.com/t5/QlikView-Creating-Analytics/Performance-Set-Analysis-vs-IF-vs-Multipli...

Best Regards,

Joseph Morales

marcus_sommer · ‎2021-02-03

An alternatively could be:

class(aggr(sum([Total Sale Amount]), Location), 1000)

and instead of the 1000 you may use a variable to get more flexible cluster - and the users just select their wanted clusters.

- Marcus

PradeepK · ‎2021-02-05

Hi @marcus_sommer

Thanks for the suggestion.. but I'm not looking for a data bucket dimension; The condition in the provided example is just for mock-up and will change as per different requirements.

The question is more on .. does using if condition vs Set Analysis makes any difference in this case? If so what is happening under the hood..?

I have seen many responses on the community regarding this comparison, but none so far that explains how exactly is Qlik Engine computing Set Expressions.. what makes it faster.. and where does field symbol table fits in all this.

Link to any technical Qlik design Article will be appreciated.

Regards,

Pradeep

marcus_sommer · ‎2021-02-06

I think there is no documentation that explains the internal processing of UI calculations in depth especially not in the combination of building the dimensionally context behind any calculation and in regard of any comparing of set analysis against the classical conditions.

Therefore here my knowledge and experiences and some deductions from them.

The most important part isn't the expression itself else the creation of a virtual table which covered the dimensionally context on which the various object expressions are performed. This action is a purely single-threated measure which must happens before any of the expressions are executed.

How much it cost? It's quite difficult to estimate because it depends mainly on the dataset, means:

overall size
structure - means the kind of the datamodel
how are the relevant fields associated
cardinality of the included fields
data-types - pure numeric values vs. record-level formatted values vs. string-values
maybe some more impacts ...

Nevertheless if I would have to guess the calculation parts independent of the concrete datamodel I would tend to:

80% to build the virtual table
10% to perform the expressions
10% to render the object

Of course that's a very roughly assumption deduced from my preferred approach - which is using a star-scheme datamodel and rather simple expressions with things like: sum({< RealSetAnalysis >} value) and avoiding if-loops especially nested ones as well as aggr-constructs and interrecord-functions.

With a rather bad performing datamodel like the (quite common and popular) link-table in which the (for the respective object) related fields are widely distributed you may increase the calculation efforts of creating the virtual table far over 90% by very simple expressions like: sum(value). On the other side you could slowdown a well build datamodel with a rather small dataset quite heavily by using many (nested) if-loops and/or aggr-constructs. I did it already myself multiple times especially in my early doings with Qlik because set analysis didn't exists at this time and of course everything was done with if-loops - and later as we switched to set analysis within the expression on the same data + datamodel it was unbelievable fast - instead of minutes like we was it accustomed with the usual BI / DB / office tools it took just a few seconds or got the results even instantly.

So what does it mean? It means looking on the object-expressions is less important as building a (to your data and requirements) suitable datamodel. No errors and omissions within the datamodel could be cured with any smart and advanced logics in the UI.

Now to your real question. In general are your both approaches the same because:

... Location = {"= sum([Total Sale Amount]) > 1000 ...

and

... if(sum([Total Sale Amount]) > 1000 ...

are both if-loops and evaluated on a row-level.

It's just implementing an if-loop within a set analysis - means only a different syntax and not like above already hinted a real set analysis. A real set analysis looked like:

... Field = {value} ...

and is nearly the same like a selection which means the state-value is appropriate set to TRUE/FALSE - means flagging the values within the system-tables. Therefore this is a column-level evaluation and all with FALSE marked values aren't included within the virtual table and don't need later to be calculated within the expressions. And therefore is the performance between a column-level and row-level evaluation so huge.

In your case I would try both approaches and check the results against each other whereby like hinted I didn't expect big differences and just from a feeling I tend to believe that here the classical if-loop is faster and also from logic and development efforts easier.

Beside of this I suggest to consider my above made suggestion of using buckets as a dimension for selections (and of course they could be even more dynamically because also the kind of aggregation and/or the aggregated field and/or the aggr-fields and/or any further conditions could be controlled through variables and/or other selections).

- Marcus

PradeepK · ‎2021-02-06

@marcus_sommerThank you for the brief explanation!! You are Awesome. 🙂

Calculated Dimension : Which approach is better ?

Calculated dimension

If else

Set Analysis