Qlik Community

Ask a Question

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Announcements
Support Cases coming to Qlik Community Oct. 4! Start chats, open cases, explore resources. Prep for the big move: READ DETAILS
Henric_Cronström

Calculated fields are often created in the script and stored under new aliases. But you can also create them in the user interface. What are the pros and cons with the two methods? And how are the user interface fields calculated?

Normally, new fields are created in the script and stored as additional columns in the data model. Just write your expression inside a Load statement and you’re done:

Calculated field in script.png

But you can also do the same thing in the user interface, and then it could look like this:

Year as dimension.png

 

So, which way should you do it?

Generally, I would say that you should put as much as possible in the script. In most cases, it is far better to have these calculations pre-made, so that they do not have to be calculated at run-time, i.e. when the user clicks.

The Qlik Engine has two fundamentally different ways to calculate such expressions: As “Calculated dimension” or as “Field-on-the-fly”. The engine automatically decides how a specific calculation should be made, depending on the expression.

 

Fields-on-the-fly

This method was introduced in one of the early versions of Qlik Sense. As the expression is evaluated, the engine creates an additional column in the data model, with a corresponding symbol table. Just as for a real field, the selection is stored in state vectors linked to this column.

In the picture below you can see a table dimension defined as “=Year(Date)”, which results in four rows.

Now look at the selection bar: When a selection is made, the corresponding year is selected in the Field-on-the-fly called “=Year(Date)” – a field that does not exist in the original data model. And in the selection bar you can see that the selection is indeed stored in this “virtual” field, and not in the Date field.

Field on the fly.png

 

Calculated dimensions

This is the old-fashioned way, and this is how QlikView still today does it. In the example below, the table dimension is “=Aggr(Year(Date),Date)” and it also results in four rows. Logically, this expression is equivalent to the above one.

But here the selection is instead made in in the underlying field: in the Date field.

Calculated dimension1.png

 

It is always possible to create a Calculated dimension, no matter what the expression looks like. But the same is not true for Fields-on-the-fly. There are limitations to when they can be generated:

  • The expression must be based on one single field only, or on multiple non-key fields from the same table in the data model
  • The expression cannot depend on the selection state, e.g. through the GetSelectedCount() function 
  • The expression cannot contain an Aggr() function

 

If a Field-on-the-fly cannot be generated, the expression will be evaluated as a Calculated dimension instead.

 

Performance

Both Calculated dimensions and Fields-on-the-fly can cause performance problems, so it is a good idea to consider moving them to the script instead. Fields-on-the-fly can almost always be moved to the script.

For Fields-on-the-fly, the performance problems become especially severe if the underlying field has many distinct values. A common example is when calendar functions like Year and Month are used on a timestamp with millions of distinct values, rather than on a date with fewer values, like 2 x 365 dates. Further; since Fields-on-the-fly are added to the data model, and the hash of the data model is used in the ID of the cache entry, Fields-on-the-fly can prevent the cache from being re-used properly.

To improve the performance, Master dimensions containing Fields-on-the-fly are now (since Nov 2019) calculated already when the first user opens the app, something which can increase the time it takes to open a document. On the other hand, this will improve the response time considerably in the analysis phase, as well as mitigate cache problems, so we are confident that this is a correct decision. Hence, put your Fields-on-the-fly in the Master dimensions!

Should you want to tweak the behavior of the engine, you can always try the following:

  • Using “UseAutoFieldOnTheFly=0” in Settings.ini will disable Fields-on-the-fly for all documents served by the engine
  • Using “Set QlikInternalDisableFotfMode=1;” in the script will disable Fields-on-fly in the app
  • Using “Set QlikInternalDisableFotfPregen =1;” in the script will prevent Fields-on-fly from being pre-calculated when the app is opened
  • Wrapping the expression in “=CalcDim(…)” will force it to be a Calculated dimension

 

But most importantly - don't use a timestamp to create your calendar! Use a date instead:

Timestamt2Date.png

Good luck!

HIC

Tags (1)
13 Comments
Henric_Cronström
  1. Is the performance hit a 1-time thing? Yes, in the sense that they are calculated and cached. But there is a theoretical possibility that they are purged from the cache if they are not used often.
    They should not per se cause slower performance in a hypercube calculation.
  2.  I have not used the Engine API to create fields-on-the-fly myself, so I might be wrong... but I think the answers are Yes, Yes, and No. 
  3. I don't think there is a "Remove field on the fly", and I don't think there would be any benefit either.
0 Likes
881 Views
Jeff_Koch
Employee
Employee

I've done some testing on this relative to your first question and the answer in my experience is a solid, NO.   I did very little clicking around and the calculations were dropped out of cache and had to be recalculated; so while HIC correctly identifies this as a theoretical possibility, I found the likelihood of it happening is high.

My testing was done around calendar calculations associated with derived calendars the load script Declare and Derived Field statements.    The derived fields are calculated field definitions that are applied behind the scenes.   Derived calendar fields tend to have very few distinct values yet the calculated field took several seconds to calculated the first time.  After that the performance was effectively identical to a data model based calendar - a few milliseconds.  However, despite the size of the calculated field being very small, it would get dropped out of cache as I moved around the app performing my analysis.

Because of the performance hits I've always avoided calculated fields whenever possible and upon discovering and validating the performance impact of "Derived" fields I've stopped using the Declare-Derive unless some aspect of the analytic requires a dynamic dimension.

Best Practice (to reiterate HIC's statement above): 

"Both Calculated dimensions and Fields-on-the-fly can cause performance problems, so it is a good idea to consider moving them to the script instead. Fields-on-the-fly can almost always be moved to the script."

868 Views
Lech_Miszkiewicz

Hi @Henric_Cronströmas always - pleasure to read and learn from you and as usual there is something from underneath the bonnet

 

Lech

0 Likes
299 Views