1 2 3 Previous Next

Qlik Design Blog

126 Posts authored by: Henric Cronström

Today, the QIX engine has some bugs in the area of a search and a subsequent select. These affect both interactive searches and searches in Set Analysis expressions. We are working on fixing them. However, one of these coming bug fixes may cause some backward incompatibility. This post explains what the bug fix will mean, and what you can do to avoid future problems.


When you use Set analysis, you can choose to have a condition in the form of a list of values in the Set expression, or you can choose to have a rule-based definition - a search for field values.

 

Examples of hard coded lists:

 

Sum( {$<Year = {2013,2014,2015}>} Sales)

Sum( {$<Country = {'Australia','Canada','France'}>} Sales)

 

Examples of searches:

 

Sum( {$<Year = {">=2013"}>} Sales)

Sum( {$<Country = {"Austr*"}>} Sales)

Sum( {$<Customer = {"=Sum(Sales)>100000"}>} Sales)

 

All the above constructions work today, and they will work correctly also after the bug fix.

 

Note the double quotes in the search expressions. These define the search strings, and between them you can write any search string – the same way as you would in a list box search.

 

However, you should not use single quotes to define a search string.

 

Single quotes are reserved for literals, i.e. exact matches. Hence they should be used when you want to make a list of valid field values, like the above list of explicit countries. But they should not be used for searches. Single quotes imply a case sensitive match with a single field value.

 

This difference between single and double quotes has been correctly described in the reference manual that states that searches are “always defined by the use of double quotes”. Also, all examples with literal field values in the documentation use single quotes.

 

Now to the bug: QlikView and Qlik Sense currently do not respect the above difference between single and double quotes. Instead, strings enclosed in single quotes are interpreted as search strings. As a consequence, it is not possible to make case sensitive matches with field values, something which you sometimes want to do.

 

This bug will be fixed in the coming November releases of Qlik Sense and QlikView. Then, wildcards in strings enclosed by single quotes will no longer be considered as wildcards. Instead, the strings will be interpreted as literals, i.e. the engine will try to match the string with a field value containing the ‘*’ character. The same logic will apply also to relational operators and equals signs.

 

Unfortunately, this bug has been incorrectly utilized by some developers: I have seen Set Analysis expressions with correct search strings, but erroneously enclosed in single quotes; for example

 

Sum( {$<Country = {'Austr*'}>} Sales)

 

This search should instead have been written

 

Sum( {$<Country = {"Austr*"}>} Sales)

 

Hence, there are documents with expressions that will not work in the corrected algorithm. However, the bug fix will be implemented in such a way that old documents will use the old logic, and new documents will use the new logic. In addition, it will be possible to force the new logic for all documents by using a parameter in Settings.ini.

 

You can of course also change the single quotes in existing search strings to double quotes, and the expression will continue to do what it always has done.

 

This post is mainly to give you a heads-up on a coming change, and give you a chance to make a quality control of your own documents. We will publish more information as we get closer to the release.

 

HIC

 

Further information related to Set Analysis and Searches:

A Primer on Set Analysis

The Search String

Introduction to Set Analysis (video) - Part 1

Set Analysis - Part 2 - Cool Stuff You Can Do (video)

“Which products contribute to the first 80% of our turnover?”

 

This type of question is common in all types of business intelligence. I say “type of question” since it appears in many different forms: Sometimes it concerns products, but it can just as well concern any dimension, e.g. customer, supplier, sales person, etc. Further, here the question was about turnover, but it can just as well be e.g. number of support cases, or number of defect deliveries, etc.

 

QV Bar chart.png

 

It is called Pareto analysis or ABC analysis and I have already written a blog post on this topic. However, in the previous post I only explained how to create a measure which showed the Pareto class. I never showed how to create a dimension based on a Pareto classification – simply because it wasn’t possible.

 

But now it is.

 

But first things first. The logic for a Pareto analysis is that you first sort the products according to size, then accumulate the numbers, and finally calculate the accumulated measure as a percentage of the total. The products contributing to the first 80% are your best, your “A” products. The next 10% are your “B” products, and the last 10% are your “C” products. In the above graph, these classes are shown as colors on the bars.

 

The previous post shows how this can be done in a chart measure using the Above() function. However, if you use the same logic, but instead inside a sorted Aggr() function, you can achieve the same thing without relying on the chart sort order. The sorted Aggr() function is a fairly recent innovation, and you can read more about it here.

 

The sorting is needed to calculate the proper accumulated percentages, which will give you the Pareto classes. So if you want to classify your products, the new expression to use is

 

=Aggr(

    If(Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo()))<0.8, 'A',

        If(Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo()))<0.9, 'B',

            'C')),

    (Product,(=Sum({1} Sales),Desc))

    )

 

The first parameter of the Aggr() – the nested If()-functions – is in principle the same as the measure in the previous post. Look there for an explanation.

 

The second parameter of the Aggr(), the inner dimension, contains the magic of the sorted Aggr():

 

    (Product,(=Sum({1} Sales),Desc))

 

This structured parameter specifies that the field Product should be used as dimension, and its values should be sorted descending according to Sum({1} Sales). Note the equals sign. This is necessary if you want to sort by expression.

 

So the Products inside the Aggr() will be sorted descending, and for each Product the accumulated relative sales in percent will be calculated, which in turn is used to determine the Pareto classes.

 

The set analysis {1} is necessary if you want the classification to be independent of the made selection. Without it, the classification will change every time the selection changes. A perhaps better alternative is to use {$<Product=>}. Then a selection in Product (or in the Pareto class itself) will not affect the classification, but all other selections will.

 

This expression can be used either as dimension in a chart, or in a list box. Below I have used the Pareto class as first dimension in a pivot table.

 

QS Pivot.png

 

If you use this expression in a list box, you can directly select the Pareto class you want to look at.

QS List box.png

 

The other measures in the pivot table are the exclusive and inclusive accumulated relative sales, respectively. I.e. the lower and upper bounds of the product sales share:

 

Exclusive accumulated relative sales (lower bound):

=Min(Aggr(

    Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),1,RowNo())),
    (Product,(=Sum({1} Sales),Desc))

  ))

 

Inclusive accumulated relative sales (upper bound):

=Max(Aggr(

    Rangesum(Above(Sum({1} Sales)/Sum({1} total Sales),0,RowNo())),
    (Product,(=Sum({1} Sales),Desc))

  ))

 

Good luck in creating your Pareto dimension!

 

HIC

 

Further reading related to this topic:

The sortable Aggr function is finally here!

Recipe for a Pareto Analysis

Recipe for an ABC Analysis

The most common way to group days, months and years in the world today is the Gregorian calendar. However, there are also other types of calendars used around the globe. This post is about how to create a non-Gregorian calendar in Qlik Sense or QlikView. 

 

The Gregorian calendar was introduced by pope Gregorius XIII in 1582. It is by far the most common calendar in the world today. Before this, the Julian calendar was used.

 

The Julian calendar with the Anno Domini era was used from around 500 AD, and is still today used by the Christian orthodox churches. The basic rules are well known: 365 days in a year, and a leap year with an additional day every fourth year.


But this results in too many leap years, so the calendar year slowly drifts off from the tropical year. Eventually we will have midsummer in May or even April. By 1582 the difference between the tropical year and the calendar year was 10 days. This problem was fixed by the introduction of the Gregorian calendar.

 

The Gregorian calendar has leap years every fourth year, just as the Julian, but it doesn’t have leap years every change of a century, even though these years fulfil the rule of "every 4th year". Only every 4th century change is a leap year, the other ones are not. Hence, the year 1900 was not a leap year in the Gregorian calendar, but it was in the Julian.

 

The Julian calendar is easy to recreate in a master calendar in a Qlik app. All you need to do is to generate all days in a four-year cycle and assign the appropriate months and day numbers.

 

Julian.png

 

But there are also other calendars, and to create these you need help tables; tables that list when the month or year starts, and tables with the names of the months or the week days.

 

The Hijri calendar – or Islamic calendar – is used by Muslims all over the world to determine the proper days for the annual fasting and to celebrate other Islamic holidays and festivals. The first year in the calendar was 622 AD during which the emigration of Muhammad from Mecca to Medina took place. This journey is known as the Hijra.

 

It is a purely lunar calendar, containing 12 months based on the motion of the moon. This means that the Hijri year is always shorter than the tropical year, and therefore it shifts with respect to the Gregorian calendar.

 

To create a Hijri calendar, you need a table containing the month starts expressed as Gregorian dates. From this, you can generate a Hijri master calendar for your data model. It is however important to understand that this calendar, like any other Hijri calendar based on calculation, only gives estimated dates. The calendar is not based on the actual sighting of the moon, which is required for the beginning of some of the months. To get a proper calendar for religious purposes you should contact your local Muslim scholar.

 

Hijri.png

 

The Hebrew calendar is used today predominantly for Jewish religious observances. It is a lunisolar calendar with 12 months based on the motion of the moon. However, to prevent it from shifting with respect to the seasons, a leap month is inserted approximately every third year.

 

To create a Hebrew calendar, you need a table containing the month starts expressed as Gregorian dates. From this, you can generate a Hebrew master calendar for your data model.

 

Hebrew.png

 

The Shamsi calendar, also known as Persian calendar or the Jalaali Calendar, is the official calendar in Iran. It is a purely solar calendar, containing 12 months originally based on the zodiac constellations. The year always starts at the vernal equinox as seen from the Tehran horizon. This means that the Shamsi calendar never shifts with respect to the tropical year. Further, no rules for leap years are needed: Depending on when the vernal equinoxes occur, some years automatically become leap years, others not.

 

To create a Shamsi calendar, you need a table containing the vernal equinoxes expressed as Gregorian dates. From this, you can generate a Shamsi master calendar for your data model.

 

Shamsi.png

 

The French Republican calendar is a purely solar calendar, containing 12 months, each with 30 days. At the end of the year, there are 5 or 6 additional days. It originally started at the autumnal equinox as seen from the Paris horizon, but it is not clear whether the intent was to have the year always start at the autumnal equinox, and thereby solving the leap year question, or the intent was to use Gregorian-like leap year rules. However, it is possible to use the autumnal equinox to recreate this calendar also for our time.

 

So, to create a French Republican calendar, you need a table containing the autumnal equinoxes expressed as Gregorian dates. From this, you can generate a French Republican master calendar for your data model. Its practical use can be questioned, perhaps, but the poetic month names can make it worthwhile.

 

French.png

 

Scripts for all the above calendars can be found on Non-Gregorian calendars. The scripts all generate non-Gregorian dates and assign these to existing Gregorian dates.

 

You should see these scripts as templates and examples of how to include a non-Gregorian calendar in your app, but don't trust the content too much - there may still be errors. Change the input data, if needed, and use scripts as models for yet additional calendars.

 

HIC

 

Further reading related to this topic:

Non-Gregorian calendars

Recipe for a 4-4-5 Calendar

Calendars

Calendars are used in most Qlik apps. In most cases, a standard Gregorian calendar is used, but in some cases a more complex calendar is needed. This post is about how to create a week-based fiscal calendar of a 4-4-5 type.

 

The 4-4-5 calendar is a week-based calendar, where the year is divided into 4 quarters, each with 3 months. The first month has 4 weeks, the second has 4 weeks, and the last has 5 weeks. Occasionally the 12th month has an additional week.

 

However, these quarters and months have nothing in common with the standard months. First, they are not in sync with the Gregorian calendar. The 4-4-5 year can for example start in the last week of August. Further, the 4-4-5 months do not have the same lengths as the Gregorian months.

 

Yearly calendar.png

 

One major advantage over a regular calendar is that the end date of the period is always the same day of the week, which is useful in planning. Similarly, the beginning of the year, the quarters and the months are all on the same day of the week. This means that different years start on different dates. In the table below this is clearly visible. The end of the year is blue and the beginning of the new year is green.

 

Year Shift2.png

 

The definition of when the year starts is often described in a phrase like “the year ends on the last Saturday of August”. But this is the same as saying that

 

  • All weeks start on Sundays.
  • September 1st always belongs to week 1. This is the Reference Date.

 

These two bullets can be expressed as integer parameters, which can be used when generating the calendar in the Qlik script. For the first day of the week, 0 is used to denote Monday and 6 is used to denote Sunday. Further, the number of days between the reference date and the Dec 31st is used to define the beginning of the year.

 

Finding the reference day may sometimes take some thinking. To help you, I have compiled some examples:

 

Ref Date Table.png

 

Once the two parameters have been defined, a calendar can be created using e.g. the script that is found on Calendars. This script also creates some other calendars, e.g. 4-5-4, 5-4-4 and Broadcast calendars.

 

The logic in the script has many steps, but is still fairly straightforward. For each date, the script needs to

  1. Find the week start of the date
  2. Use the week start to find the reference date of the input date. Note that the relevant reference date sometimes is after the date itself.
  3. Use the reference date to find which fiscal year the date belongs to. The start date of the year is also calculated.
  4. Use the start of the year to calculate the day number within the year
  5. Finally, the day number of the year can be used to calculate the remaining fields.

 

Summary: It is possible to create a script that generates a correct 4-4-5 calendar.  Don’t hesitate to download the script example and modify it, if you need a 4-4-5 calendar.

 

HIC

 

Further reading related to this topic:

Calendars

Redefining the Week Start

Redefining the Week Numbers

Redefining the Week Start in Qlik Sense

Non-Gregorian calendars

Henric Cronström

Natural Sets

Posted by Henric Cronström Apr 12, 2016

Set Analysis is a commonly used tool when creating advanced formulas in Qlik Sense or QlikView. The basics are fairly simple, but it’s clear that some aspects of it are very complex. In this post, I will try to answer some simple questions that touch the core of the complexity, such as: Can all sets have set modifiers? Are some sets more fundamental than others?

 

All sets are not equal. Some sets are indeed more fundamental than others.

 

Apart from the two trivial sets {0} (nothing) and {1} (everything), the most fundamental group of sets are the Natural Sets. These are the sets that can be defined as a selection or through a Set modifier, e.g.

 

    {$}

    {$<Country={France}>}

    {BM01<Product=>}

 

Compare these to the non-natural sets, where set operators are used for the set definitions:

 

    {1-$}

    {BM01*$}

 

To understand the difference between the two, you need to know the difference between element sets (lists of distinct field values) and record sets (lists of records in the internal database). See Why is it called Set Analysis?.

 

Internally, all sets are stored as element sets and/or record sets in state vectors. For Natural sets both the element sets and the record set can be stored, but for the non-natural sets only the record set can be stored. The reason is that there are no well-defined element sets for non-natural sets.

 

Consider for instance the following table where you want to sum numbers that are not assigned to a customer:

 

Companies.png

 

With Set analysis, this is straightforward. The second expression calculates exactly this, using a non-natural set:

 

Pivot.png

 

But if you try to create the same filter using a selection in the list boxes, you will find that it is not possible - Null is not selectable.

 

List boxes.png

 

No matter how you select from the two list boxes, you cannot recreate this filter. Hence, a non-natural set cannot always be defined through a selection, and thus cannot be stored as element sets.

 

From a logical perspective, you could say that Natural sets are always defined by the element sets (the state vectors of the symbol tables), while the record set (the state vector of the data table) is just the result of the logical inference. For non-natural sets, it is the other way around: These are defined by the record set.

 

This has a number of consequences. First, it is not possible to use a set expression like

 

    { (BM01 * BM02) <Field={x,y}> }

 

where the normal (round) brackets imply that the intersection between BM01 and BM02 should be evaluated before the set modifier is applied. The reason is of course that there is no element set that can be modified.

 

Further, you cannot use non-natural sets inside the P() and E() functions. The reason is that these functions return an element set, but it is not possible to deduce that from the record set of a non-natural set.

 

Finally, a measure cannot always be attributed to the right dimensional value if a non-natural set is used. For example, in the chart below, you can see that some excluded sales numbers are attributed to the correct ProductCategories, whereas others have NULL as ProductCategory.

 

Misassignment.png

 

Whether the assignment is correctly made or not, depends on the data model. In this specific case, the number cannot be assigned if it pertains to a country that is excluded by the selection.

 

Summary:

  • Only natural sets can be re-created as interactive selections and stored in bookmarks
  • Only natural sets can be modified
  • Only natural sets can be used inside the P() and E() functions
  • A measure cannot always be attributed to the correct dimensional value if a non-natural set is used

 

HIC

 

Further reading related to this topic:

Why is it called Set Analysis?

Colors, States and State vectors

Symbol Tables and Bit-Stuffed Pointers

Databases are usually not very forgiving.

 

Strict rules apply, defining what’s allowed and what’s not. For example, you are not allowed to enter data unless it has the right data type and is formatted the right way. Further, you are often not allowed to enter a value for a foreign key unless this value already exists in the master table. And you are not allowed to enter the same value twice if the field is a primary key.

 

The reason is of course to ensure data integrity. Without such rules, the database would soon be cluttered with bad quality data and contain a large number of errors.

 

The fact is that a good system is one that has a large number of rules, but at the same time is easy to use: Equipped with a user interface designed in a way so that the user doesn’t notice the rules – or at least isn’t disturbed by them.

 

But with the QIX engine it is a very different situation.

 

QlikView and Qlik Sense should not make sure that the data is free from errors. Instead, they should do exactly the opposite: Display the source data along with all its errors. This requirement is totally different from the demands you have on a database, and as a result the QIX engine is built in a different way:

 

No Data Types.pngData Types

 

There are no data types in the QIX engine. The reason is simple: You may have data from different tables or even from different data sources in one single field. Then there is a potential risk that you have different data types in the different sources.

 

When loaded, all fields are converted into duals (number and text, or just text), and so one field can contain data that originally had different types.

 

Mixed Formats.pngFormatting

 

A single field can have a mixed data format. Also here, the reason is simple: Different sources may have different formats. As a result, it doesn’t matter if a date is formatted as 3/31/16, 2016-03-31 or 42460. They will all three represent March 31, 2016.

 

Each distinct field value has its own format, and a single field may thus be displayed with different formats.

 

No Referential Integrity.pngReferential integrity

 

The QIX engine does not enforce referential integrity. For example: You may have a customer ID in your fact table that does not exist in the customer table (which would be an error in the data integrity of the database). But the QIX engine will accept this and show NULL as customer name.

 

 

Assume Many To Many.pngRelationship type

 

Often you know if you have a many-to-one or a many-to-many relationship between two entities. But this information is not loaded from the database. Instead the QIX engine assumes worst case and is always prepared for a many-to-many relationship.

 

Links between tables don’t carry information about relationship type. And all calculations involve aggregations, since there is a possibility for multiple values of the referenced field.

 

The bottom line is that the QIX engine is a very forgiving engine. It handles errors in all of the above cases gracefully. No matter how many such errors you have in the data, the QIX engine will always make a best-effort attempt in evaluating and showing the loaded data.

 

HIC

 

Further reading related to this topic:

Data Types in QlikView

Automatic Number Interpretation

It’s all Aggregations

The Aggr() functions is one of the most advanced functions in the QIX engine, and it is not always easy to use. It does not get easier when you put set analysis expressions in it.

 

In one of my previous posts (Pitfalls of the Aggr function) I recommended having the set analysis expression in both the inner and the outer aggregation function when using set analysis in the Aggr() function. This is a good rule of thumb, because in most cases doing so will generate the result that you want.

 

But this is not always correct.

 

In more complex calculations you often need to use the condition in one place only – sometimes in the inner aggregation, sometimes in the outer. It depends on how the condition is formulated. Then it is important to understand the difference between the two positions.

 

The evaluation of the Aggr() function is a two-step process: In the first step, an intermediary virtual table is created using the inner aggregation and the dimension of the Aggr(). In the second step, this virtual table is aggregated using the outer aggregation.

 

For example, say that you want to find the largest order value per year. Then you would need to first calculate the sales value per order, and in a second step find the largest of these values. Hence

 

Formula1.png

 

The first step aggregates the source data (with multiple records per Order ID) into a virtual table with one record per Order ID, and the second step finds the largest values in the virtual table.

 

Two-steps.png

 

However, there is not yet any set analysis in the expression. So, let us use the following requirement instead:

 

  1. Show the largest order value per year
  2. Include only products from the product group “Shoes” in the order value
  3. Calculate these numbers only for 2014 and 2015

 

The two conditions correspond to the following set analysis expression:

 

Formula2.png

 

But where should this expression be written? In the outer or in the inner aggregation?

 

To answer this question, we must ask ourselves in which step the conditions should be used. Then it becomes obvious that the condition in product group must be used in step one – in the inner aggregation. If it is used in the outer aggregation only, the order values will be incorrect – they will be calculated from all products.

 

The condition in year, however, can be put in either place. Hence, the following expression will work fine:

 

Formula6.png

 

From the above example one might draw the conclusion that you always should put the condition in the inner aggregation. But this is not the case. Sometimes you have a condition that cannot be put in the inner aggregation. The following requirement can serve as example:

 

  1. Per year, show the bestselling product within the product group “Shoes”
  2. Show how this product ranks compared to all products, also non-shoes

 

The solution is the following table

 

Ranks.png

 

The condition in product group should be evaluated in step two, so the expressions used for Product and Rank are:

 

Formula5.png

 

Here it is not possible to have the condition on product group in the inner aggregation, since this would interfere with the calculation of the rank. You must have it in the outer aggregation.

 

Bottom lines are:

  • You need to figure out if your condition should be evaluated in step one (in the inner aggregation) or in step two (in the outer aggregation). This will determine where to put the set analysis expression. You may need to use The Feynman Problem Solving Algorithm.
  • If you can’t figure out where to put your set analysis expression – try putting it in both the outer and the inner aggregation function, and keep your fingers crossed. Afterwards you should however verify that the numbers are what you want.

 

HIC

 

Further reading related to this topic:

Pitfalls of the Aggr function

A Primer on Set Analysis

In Set expressions, an equals sign can be used to assign a selection to a field. But this is not the only assignment symbol that can be used. There are a couple of others. This post is about the alternative assignments symbols that can be used, and their respective use cases.

 

Strictly speaking, the equals sign in set expressions is not an equals sign. Rather, it is an assignment symbol. This is the reason why you cannot use relational operators in its place. The equals sign assigns a selection state to a field e.g.

 

     { $ <Country = {Sweden, Germany, 'United States'}> }

 

In this case, the element set ‘Sweden’, ‘Germany’, ‘United States’ is assigned as selection to the field “Country”.

 

But what if the set identifier already has a selection in the field “Country”?

 

In such a case, the old selection will be replaced by the new one. It will be like first clearing the old selection of the field, then applying a new selection.

 

However, this is not the only way to assign a set as selection in a field. You can also use assignments with implicit set operators. These will use the existing selection in the field to define a new selection:

 

Implicit Union:

 

     { $ <Country += {'United States'}> }

 

Note the plus sign.

 

This expression will use the union between the existing selection and ‘United States’ as new selection in the field, i.e. add the listed values to the previously select ones. The use case is not a common one, but it happens sometimes that you always want to show a specific country (or product or customer) as a comparison to the existing selection. Then the implicit union can be used.

 

Implicit Intersection:

 

     { $ <Country *= {"=Continent='Europe' "}> }

 

Note the asterisk.

 

This will use the intersection between the existing selection of countries and the countries in Europe as new selection in the field. (The search is an expression search that picks out European countries.) The set expression will not remove any part of the existing condition – instead it will just be an additional limitation.

 

This construction can in many cases be used instead of a normal assignment. In fact, it is often a much better solution than the standard assignment, since it respects the user selection and thus is easier to understand.

 

The implicit intersection should be used more.

 

Implicit Exclusion:

 

     { $ <Country -= {'United States'}> }

 

Note the minus sign.

 

This expression will use the existing selection, but exclude ‘United States’. The use case for an implicit exclusion is exactly this – you want to exclude a value. Hence, this construction is very useful.

 

Implicit Symmetric Difference:

 

     { $ <Country /= {'United States'}> }

 

Note the slash.

 

The above expression will select values that belong either to existing selection, or to the values in the set expression, but not to both. It’s like an XOR. I have never used this, and I would like to challenge the readers to come up with a relevant use case.

 

Bottom line: Assignments with implicit set operators are sometimes very useful. You should definitely consider using the implicit intersection and the implicit exclusion.

 

HIC

 

Further reading related to this topic:

A Primer on Set Analysis

Why is it called Set Analysis?

Facts in BI solutions are usually additive – but not always. To avoid mistakes, it is important to understand when you can sum the number, and when you cannot. This post will not only help you understand the problem, but also point at some possible ways to handle non-additive numbers.

 

Numbers fall into one of three categories:

 

  • Fully additive numbers, which can be summed across any of the dimensions. Most transactional amounts are additive.
  • Semi-additive numbers, which can be summed across some dimensions, but not all. An example is warehouse balances: these are amounts that are additive across all dimensions except time.
  • Non-additive numbers, which cannot be summed over any dimension. An example is the gross margin for a product.

 

Additive numbers are straightforward: Just use the Sum() function in your measure, and everything will work.

 

Semi-additive and non-additive numbers are however not as straightforward. These will be shown correctly only in a chart with exactly the same grain as the source data, i.e. if the chart dimension(s) is the primary key in the source table. In all other cases the sum function will potentially return incorrect results. In other words - these numbers are often unusable in a dynamic analysis app.

 

But there are ways to get around the problem:

 

Balances

Balances and other numbers describing a situation in time are usually semi-additive. They can be summed over all dimensions, except time. Adding the balance for one month with that of the next month would not make sense. But it does make sense to sum balances from different warehouses. So for balances you need to adapt the expression to reflect this. If your data contains one balance number per month, you can often use an expression similar to one of the following:

 

     Average:      Sum( Balance ) / Count( distinct Month )

     Last value:   FirstSortedValue( Aggr( Sum( Balance ) , Month ), -Month )

 

In the table below, you can see the population for some countries during four years. By dividing by the number of years, the semi-additive population can be used also in the total column.

 

Population.png

 

Ratios

Ratios, percentages and averages are usually non-additive numbers. Examples are the gross margin and the average order value. If you have such fields in your source data, it is likely that your data already has been aggregated once. Such fields are often calculated by dividing one number with another, e.g. by expressions similar to the following:

 

     GrossMargin           = ( Sum ( Revenue ) - Sum( CostOfGoods ) ) / Sum( Revenue )

     AverageOrderValue  = Sum ( OrderValue )  / Count ( distinct OrderID )

 

Note that the numerators and the denominators all are additive numbers!

 

Hence, if you have the original numerator and the denominator in your source data, you should use these in your application, instead of the pre-calculated ratios. In other words: Use the above expressions as measures in charts or gauges to define your gross margin and average order value. Do not use the pre-calculated ratios.

 

Mixed Units

In some cases, you have mixed units in a field, which makes the numbers non-additive. The most common case is that you have mixed currencies. This is a problem that is easily solved: Just convert the numbers to a common currency or common unit already in the script, by multiplying with the appropriate currency rate, e.g.

 

     LocalCurrency * CurrencyRate as CommonCurrency,

 

Then you can use the common currency as an additive field. You may need a join or an Applymap() to get the correct currency rate into the fact table.

 

Incomplete Data

There is also the case of incomplete data, i.e. where you have the total number and the numbers for some parts, but data is missing for some other parts. In such a case, you need to convert this source table to a complete set of data, e.g. by removing the "Total" record and introducing an "Others" record that is calculated from the total minus the sum of all known parts.

 

Bottom line: If you have semi-additive or non-additive numbers, you need to convert these to something usable using one of the methods described here. If you can't, the numbers are almost useless and you should use the Only() function when you display them, thereby preventing a summation.

 

HIC

 

Further reading related to this topic:

Scales of Measurement

Henric Cronström

Accumulations

Posted by Henric Cronström Nov 10, 2015

When building analytical applications you very often encounter cases where you want to accumulate numbers inside a chart. It could be that you want to calculate a Year-to-date number, or a rolling 6 month average, or a moving annual total.


When creating an accumulation, there are two fundamental challenges. First, there is the question on how to treat dimensional values with no data. The tables below illustrate the problem:

 

Missing month.png

 

In this example, there is no data for the month of March. The default behaviour is that this row then is omitted, see left table. This leads to an incorrect accumulation – the rolling 3 total for April includes January, which it shouldn’t. The table to the right however treats missing months correctly.

 

Secondly, there is the question on how to treat the first dimension when there are several dimensions. The tables below illustrate the problem:

 

First dimension.png

 

In the left table, you want the accumulation to re-start for each new product. You do not want the accumulation to continue from the previous product. Note that for ‘2014 Jan’ the accumulation is reset.

 

But in the right table, it is the other way around: A moving annual total should span over two calendar years, so you want the accumulation to continue over into a new value of the first dimension.

 

In QlikView and Qlik Sense there are several ways to create accumulations, each with its own pros and cons.

 

1. Using the control in QlikView charts

In a QlikView chart, you can easily create an accumulation just by selecting the right chart setting:

 

Chart UI accumulation.png

 

This method will however fail both the above challenges. Also, an accumulation can only be made over the first dimension, and the accumulation will always be reset for a new value in the second dimension.

 

Further, you cannot use it to calculate moving averages. It only works for calculations of sums.

 

2. Using the Above() function

If you use the Above() function in a chart table like

 

     RangeSum(Above(total Sum(Amount) , 0, 12))

 

or in an Aggr() table like

 

     Only(Aggr(RangeSum(Above(total Sum({Amount),0,12)), YearMonth))

 

you will get a 12-month moving total. It will pass the “multiple dimensions” challenge: By using – or not using – the total specifier in the Above() function you can decide which behaviour you want.

 

But it will fail the “no data” challenge. There is a however a way to make an accumulation in an Aggr() table work also for this problem: Trick the QIX engine to show all rows by adding a zero term for all dimensional values:

 

     Only(Aggr(RangeSum(Above(total Sum(Amount) + Sum({1} 0),0,12)), YearMonth))

 

and enable the display of zero values. Then you will get the right numbers also when some dimensional values are excluded.

 

3. Using an As-Of table in the data model

The As-Of table, is an excellent solution that will pass both challenges. Here you must set the condition using Set Analysis:

 

     Sum({$<MonthDiff={"<12"}>} Amount)

 

However, it has one drawback: When a user clicks in the chart, a selection will be made in the AsOfMonth. But you don’t want the user to make selections here: You want month selections to be made in the real month field. So I usually make my charts read-only if they use the AsOfMonth as dimension.

 

With this, I hope that you understand accumulations better.

 

HIC

 

Further reading related to this topic:

The Above Function

The As-Of Table

Calculating rolling n-period totals, averages or other aggregations

Henric Cronström

The As-Of Table

Posted by Henric Cronström Nov 3, 2015

Last week I wrote about how the Above() function can be used for calculating rolling averages and other accumulations. There is however also an alternative method for doing the same thing:


The As-Of table.

 

When you use the Above() function, you fetch a number from other rows in a chart or Aggr() table. The As-Of table is slightly different in this respect: It is not a transient table created by an object or an expression – instead it is a real table in the data model.

 

The idea is to create a secondary month field – the AsOfMonth - that links to multiple real months.

 

Fields.png

 

In the example above, you can see that ‘2015 Oct’ links to several preceding months, and each Month in turn links to several rows in a fact table. This means that a specific transaction will be linked to several AsOfMonths.

 

In the data model, the As-Of table should appear as a separate calendar table that links to the existing primary calendar table:

 

DataModel.png

 

One way to create this table is the following:

 

First, make sure that you in your master calendar have a field “Month” that is defined as the first date of the month, e.g.

 

Date(MonthStart(Date),'YYYY MMM') as Month,

 

Then add the following lines at the end of the script:

 

// ======== Create a list of distinct Months ========
tmpAsOfCalendar:
Load distinct Month
  Resident [Master Calendar] ;

 

// ======== Cartesian product with itself ========
Join (tmpAsOfCalendar)
Load Month as AsOfMonth
  Resident tmpAsOfCalendar ;

 

// ======== Reload, filter and calculate additional fields ========
[As-Of Calendar]:
Load Month,
  AsOfMonth,
  Round((AsOfMonth-Month)*12/365.2425) as MonthDiff,
  Year(AsOfMonth)-Year(Month) as YearDiff
  Resident tmpAsOfCalendar
      Where AsOfMonth >= Month;

Drop Table tmpAsOfCalendar;

 

Once this table has been created, you can use the AsOfMonth as dimension in charts where you want rolling averages and accumulations.

 

If you as measure use

 

Sum({$<YearDiff={0}>} Sales)

 

you will get a yearly accumulation – year-to-date up until the day of the script run.

 

Yearly accumulation.png

 

If you instead use

 

Sum({$<MonthDiff={"<6"}>} Sales) / Count(distinct {$<MonthDiff={"<6"}>} Month)

 

you will get a 6-month rolling average:

 

Rolling average.png

 

And finally, if you use

 

Sum({$<MonthDiff={0}>} Sales)

 

You will get the real, non-accumulated numbers.

 

I have made the Set Analysis expressions based on two fields: YearDiff and MonthDiff. However, for clarity it could be a good idea to add flags in the As-Of table, so that the Set Analysis expressions become even simpler, e.g.

 

If(MonthDiff=0,1,0) as IsSameMonth,
If(YearDiff=0,1,0) as IsSameYear,
If(MonthDiff<6,1,0) as IsRolling6,

 

Summary: The As-Of table is a good way to calculate rolling averages and accumulations.

 

HIC

 

Further reading related to this topic:

The Above Function

Accumulations

Calculating rolling n-period totals, averages or other aggregations

Henric Cronström

The Above Function

Posted by Henric Cronström Oct 27, 2015

The Above() function is a very special function. It is neither an aggregation function, nor a scalar function. Together with some other functions, e.g. Top(), Bottom() and Below(), it forms a separate group of functions: Chart inter-record functions. These functions have only one purpose: To get values from other rows within the same chart.

 

The basic construction is the following:

 

    Above( Sum( Sales ) )

 

This will calculate the sum of sales, but for the row above.

 

The most common use case is when you want to compare the value of a specific row with the value of the previous row; e.g. this month’s sales compared to last month’s sales.

 

MCT.png

 

Another use case is when you want to calculate rolling averages. Then you need to use the second and third parameter; the offset and the number of cells. Below, I use

 

    Above( Sum( Sales ), 0, 12 )

 

The function will return 12 rows: the value for current row and the 11 rows above. This means that you need to wrap it in a range function in order to merge all values to one value. In this case, I use RangeAvg() to calculate the average of the 12 rows.

 

RAT.png

Above.png

 

However, both the above solutions have a flaw: They don’t take excluded values into account. For example, if April is excluded due to a selection, the previous month of May becomes March, which probably isn’t what you want.

 

To correct this, you need to make the chart show all months, also the excluded ones. In QlikView, you have a chart option “Show all values” that you can use. A method that works also in Qlik Sense, is to add zero to all values, also for the excluded dimensional values:

 

    Sum( Sales ) + Sum( {1} 0 )

 

Make sure to “Show zero values”.

 

You can also use the Above() function inside an Aggr() function. Remember that the Aggr() produces a virtual table, and the Above() function can of course operate in this table instead. This opens tremendous new possibilities.

 

First, you can make the same calculations as above, by using

 

    Only(Aggr(Above(Sum({1} Sales)), YearMonth))

 

    Only(Aggr(RangeAvg(Above(Sum({1} Sales),0,12)), YearMonth))

 

Note the Set Analysis expression in the inner aggregation function. The {1} ensures that all values in the virtual table are calculated, so that the Above() function can fetch also the excluded ones. Using {1} is maybe too drastic – it is often better to use a Set expression that clears only some fields, e.g. {$<YearMonth=>}.

 

Further, you can have a virtual table that is sorted differently from the chart where the expression is displayed. For example, the expression

 

     Aggr(Above(Sum(Sales)),Year,Month)

 

displays the value from the previous month from the same year. But if you change the order of the dimensions, as in

 

     Aggr(Above(Sum(Sales)),Month,Year)

 

the expression will display the value from the same month from the previous year. The only difference is the order of the dimensions. The latter expression is sorted first by Month, then by Year. The result can be seen below:

 

Final.png

 

An Aggr() table is always sorted by the load order of the dimensions, one by one. This means that you can change the meaning of Above() by changing the order of the dimensions.

 

With this, I hope that you understand the Above() function better.

 

HIC

 

Further reading related to this topic:

Accumulative Sums

Accumulations

Pitfalls of the Aggr function

The Set Analysis is a commonly used tool when creating static or dynamic filters inside aggregations in Qlik Sense or QlikView. But sometimes you need the opposite to selecting – you need to exclude values. How can this be done?

 

In Set Analysis, it is straightforward to make selections; to define the criteria for inclusion. But it is not as straightforward to define an exclusion. But there are in fact several ways to do this.

 

First of all, an inclusion can be defined the following way:

 

Formula0.png

 

This expression is equivalent to saying “Sum the Amounts where Field equals X”.

 

But if you want to say the opposite – “where field does not equal X” – it becomes more complicated. The relation “not equal to” is not a Set operation so it cannot be used.

 

One way to do this is to use the implicit exclusion operator:

 

Formula1.png

 

Note the minus sign in front of the equals sign. This will create an element set based on the existing selected values, but with the value X removed.

 

A second way to do this is to use the unary exclusion operator:

 

Formula2.png

 

This will return the complement set of X and use this as element set in the set expression.

 

In many situations the two methods return identical sets. But there are cases when they are different. In the table below you can see that it makes a difference if there already is a selection in the field.

 

Image5.png

 

Note the difference on the row with the C. Since the implicit operator (the 2nd expression) is based on current selection, also the value C is excluded from the calculation. This is in contrast to the unary operator (the 3rd expression) that creates a completely new element set, not based on current selection.

 

We can also see that both of the above expressions exclude records where the field is NULL (the line where Field has a dash only). The reason is simple: As soon as there is a selection in the field, the logical inference will exclude NULL in the same field.

 

So what should you do if you want to exclude X but not NULL?

 

The answer is simple: Use another field for your selection. Typically you should use the primary key for the table where you find the Amount.

 

Formula3.png

 

Here you need to use the element function E(), that returns excluded values. Hence, the above Set expression says: “Select the IDs that get excluded when selecting X.”

 

Image7.png

 

The table confirms that we get the records we want: X is excluded but NULL is still included.

 

With this, I hope that you understand Set Analysis somewhat better.

 

HIC

 

Further reading related to this topic:

A Primer on Set Analysis

Why is it called Set Analysis?

Finding NULL

Set Analysis is a commonly used tool when creating advanced formulas in Qlik Sense or QlikView. But why is it called “Set Analysis”? And why are there so many curly brackets?


Set Analysis is really a simple and straightforward tool, but with an advanced syntax. “Simple” since its only function is to allow you to define a calculation scope different from the current selection. The basics are very much like defining a SQL Where clause.

 

A different term for the scope is the “record set”. With this, we mean the set of records in the data over which the aggregations should be calculated.

 

Image4.png

 

The above Set analysis expression is equivalent to selecting 2015 from OrderYear. It is also equivalent to a SQL condition “Where OrderYear=2015”.

 

The Set analysis syntax is advanced because there are so many different ways to define a record set; you can use a selection, a bookmark, or an alternate state. Further, just as in any set algebra, you can define intersections, unions and other set operations, e.g. the difference between two sets:

 

Image1.png

 

This expression describes the Set difference between current selection $ and the alternate state State2: I.e. records that are included by the current selection, but not included in State2.

 

Venn.png

 

Another set expression is the following:

 

Image3.png

 

This defines the records belonging to any order date that occurs before the latest order date in the current selection. The dollar expansion is needed, since the parser expects field values in clear text in the search string.

 

Note that the last Set expression contains two pairs of curly brackets. Set analysis expressions with many curly brackets look complicated, but there is in fact method in the madness…

 

  • The outer brackets: Sum( { … } Sales )
    These define the record set; i.e. the records in the data over which the aggregation should be made.
  • The inner brackets: Sum( {$<Field= { … } >} Sales )
    These define the element set; i.e. the set of individual field values that define the selection in this field.

 

For those of you who have read about Symbol Tables and Bit-Stuffed Pointers it may be interesting to know that the element set operate on the symbol tables, whereas the record set is defined on the data tables.

 

The element set can be a list of values or a search string. It can also be described by field references or by the P() and E() functions, e.g.

 

Image5.png

 

Summary:

  • There are two different sets in a Set expression: The Record Set and the Element Set.
  • The record set (the outer curly brackets) defines the records in the data tables. Use 1, $, a bookmark or an alternate state as Set identifier.
  • The element set (the inner curly brackets) defines the selection that modifies a record set. Use one or several of the following to define your element set:
    • A list of values, e.g: {2013,2014,2015}
    • A search, e.g: {">=2015-01-06<=2015-02-05"}
    • A field reference
    • A set function, i.e: P() or E()
  • Set operators can be used to define both the record set and the element set.

 

With this, I hope that you understand Set Analysis somewhat better.

 

HIC

 

Further reading related to this topic:

A Primer on Set Analysis

Symbol Tables and Bit-Stuffed Pointers

Implicit Set Operators

The Aggr() functions is one of the most advanced functions in the QIX engine, and it is not always easy to use. This blog post is about its most common pitfalls.

 

This is a function that enables the app developer to create nested aggregations, i.e. aggregations in two steps. Essentially the Aggr() is a For-Next loop where each loop contains a measure calculation. When all loops have been performed, a final aggregation based on the array of calculated measures can be made.

 

If we, for instance, want to calculate the monthly product sales, we could do it by using

 

Aggr( Sum(Sales), ProductName, OrderMonth )

 

This will produce a virtual table that looks like

 

Image1.png

 

Note that there is exactly one row per distinct combination of the two dimensions.

 

So, the Aggr() function creates a virtual table; an array of values that need to be aggregated in a second step. Hence you should wrap the Aggr() in an outer aggregation function. For instance, if you want to use the above table to calculate the average monthly sales, you should use

 

Avg( Aggr( Sum(Sales), ProductName, OrderMonth ))

 

Image2.png

 

With this mental picture of how the function works, we can start looking at the pitfalls.

 

1. Missing inner aggregation function

 

The first parameter of the Aggr() is a measure, and as such it is always an aggregation. Hence, you should use an aggregation function. If you don’t, the calculation will use Only() and the virtual table may inadvertently get NULLs in the Measure column. See more on Use Aggregation Functions!

 

2. Missing outer aggregation function

 

The outer aggregation function is needed whenever Aggr() returns more than one value. If no aggregation function is specified, Only() will be used, which may cause your chart measure to contain NULLs.


Note the totals line in the chart above: It shows the average over both products and months. Generally, totals will always be NULL if the outer aggregation is omitted.

 

3. Missing Set Analysis expression

 

So, there are both inner and outer aggregation functions. Where do I put my Set Analysis expression?

 

The answer is often “In both”. It may not be enough to have it in only one of the levels.

 

The current selection (or the relevant selection state) will always affect the aggregation scope of an aggregation function, unless you specify otherwise using Set Analysis. So, if you need Set Analysis you should in most cases have similar or identical Set Analysis expressions in both the inner and outer aggregation function.

 

4. Grain mismatch

 

When an Aggr() is used in a chart, you have inner dimensions in the Aggr() and outer dimensions in the chart. Then it is important that these match.

 

The grain of the Aggr() dimensions must be identical or finer than that of the chart dimensions.

 

For example, an Aggr() with Month as dimension can be put in a chart that has Year as dimension. In the following table, the average monthly sales numbers are correctly calculated, and displayed as an average per year and product.

 

Aggr correct grain.png

 

But the opposite isn’t necessarily possible. If you put an Aggr() with Year as dimension in a chart with Month as dimension, you will get problems. You may get a table that looks like this:

 

Aggr incorrect grain.png

 

Here, the Aggr() function has produced one number per product and year, and this number has been assigned to one single month while the other months contain NULL. This is a result of the grain mismatch and most likely not what you want.

 

With this, I hope that you have a better understanding of Aggr().

 

HIC

 

Further reading related to this topic:

AGGR...

When should the Aggr() function NOT be used?

Use Aggregation Functions!

Filter Blog

By date:
By tag: