Dissecting the Engine API - Part 5: Multiple Hyper... - Qlik Community

Michael_Tarallo · ‎2021-10-04

My friend, Øystein Kolsrud - Software Architect at Qlik, is back with part 5 of the Qlik Engine API fundamentals:

Multiple Hypercube Dimensions

Previous articles:

With part 5 of this series Øystein will discuss some consequences of how the engine computes the data sets that hypercubes define.

Introduction

The hypercubes we have looked at so far were used to calculate multiple expressions for all values of a single field. We have seen how we in one go can compute both the sales, and the sales per month for all employees. Our next step will be to compute these statistics not only for the total set of data, but for sales per year as well. The data we have contains sales information for two different years, so if we want to calculate our expressions for a particular year, then we could simply select that year, and retrieve the data. The engine will do all the filtering for us and guarantee that only sales for the selected year is included in the hypercube computations. It is quite possible to use such a flow to iterate across the years in order to get sales information for each year, but as anyone who has ever added a table visualization to an app knows, there is a more efficient way to do this! We can simply extend the hypercube with the "Year" dimension, and have the engine do all the work for us!

How the engine deals with multiple dimensions

When the engine encounters multiple dimensions in a hypercube, it will compute the expressions for all possible combinations of values of the dimensions. Since our data contains information about sales for two employees from the years 2019 and 2020 we get the following set of four combinations:

"Amalia Craig", 2019
"Amalia Craig", 2020
"Amanda Honda", 2019
"Amanda Honda", 2020

The hypercube definition we will use for our computations needs to contain two entries in the "qDimensions" array, like this:

{
  "jsonrpc": "2.0",
  "id": 21,
  "method": "SetProperties",
  "handle": 4,
  "params": [
    {
      "qInfo": {
        "qId": "3cb898dc-b7dc-44ed-ba83-c27fae5d0658",
        "qType": "myCube"
      },
      "qHyperCubeDef": {
        "qDimensions":
          [ { "qDef": { "qFieldDefs": [ "SalesRep"  ] } },
            { "qDef": { "qFieldDefs": [ "Year" ] } }
          ],
        "qMeasures":
          [ { "qDef": { "qDef": "Sum(Sales)" } },
            { "qDef": { "qDef": "Sum(Sales)/Count(Month)" } }
          ]
      }
    }
  ]
}

Now that we have added a second dimension, the resulting table will contain four columns (two dimensions, two measures), so in order to get all data, we need to set the width of the page ("qWidth") to 4:

{
  "jsonrpc": "2.0",
  "id": 22,
  "method": "GetHyperCubeData",
  "handle": 4,
  "params": [
    "/qHyperCubeDef",
    [ { "qLeft": 0,
        "qTop": 0,
        "qWidth": 4,
        "qHeight": 20
      }
    ]
  ]
}

The response will contain 4 rows with 4 cells each for a total of 16 cells (and I'll be truncating to reduce verbosity):

{
  "jsonrpc": "2.0",
  "id": 22,
  "result": {
    "qDataPages": [
      {
        "qMatrix": [
          [ { "qText": "Amalia Craig", ... },
            { "qText": "2019", ... },
            { "qText": "104480", ... },
            { "qText": "2749.4736842105", ... }
          ],
          [ { "qText": "Amalia Craig", ... },
            { "qText": "2020", ... },
            { "qText": "156662", ... },
            { "qText": "2701.0689655172", ... }
          ],
          [ { "qText": "Amanda Honda", ... },
            { "qText": "2019", ... },
            { "qText": "84396", ... },
            { "qText": "2482.2352941176", ... }
          ],
          [ { "qText": "Amanda Honda", ... },
            { "qText": "2020", ... },
            { "qText": "169293", ... },
            { "qText": "2821.55", ... }
          ]
        ],
        ...
      }
    ]
  }
}

And there you have it! The engine has computed our two expressions for both employees for both years!

The size of hypercubes

The data rows you get for a particular cube depends not only on the number of field values of the dimensions, but also on how those values relate to each other in the data model. Say for instance that "Amanda Honda" had no sales recorded for the year 2019 (perhaps because she was hired in 2020). The loaded data will then not contain any records associating "Amanda Honda" with the year "2019". When you get the data for the hypercube in this scenario, then there will only be three rows returned as the combination of the field values "Amanda Honda" and "2019" is excluded by the data.

You can check how many rows a hypercube has by looking at the property "qSize" of the hypercube layout. This is what the "GetLayout" response could look like in our case:

{
  "jsonrpc": "2.0",
  "id": 23,
  "result": {
    "qLayout": {
      "qInfo": {
        "qId": "3cb898dc-b7dc-44ed-ba83-c27fae5d0658",
        "qType": "myCube"
      },
      "qHyperCube": {
        "qSize": {
          "qcx": 4,
          "qcy": 3
        },
        "qDimensionInfo": [ ... ],
        "qMeasureInfo": [ ... ],
        ...
      }
    }
  }
}

The "qSize" property indicates both the number of columns ("qcx") and the number of rows ("qcy").

Excluded combinations

One might think that there should be a row generated also for combinations that do not have values. After all, one could without lying say that the sales for "Amanda Honda" in 2019 was 0. But this is not how the engine works! And that is a good thing! If the engine didn't do this type of filtering, then most cubes would result in such a large number of rows that they would be completely unusable. Consider for example that all sales transactions had both a timestamp and a unique ID associated with them. Then we create a hypercube that lists the timestamp, the ID, and the sales amount for each transaction represented in 1000 USD. We could define our cube like this:

"qHyperCubeDef": {

  "qDimensions":

    [ { "qDef": { "qFieldDefs": [ "Timestamp"  ] } },

      { "qDef": { "qFieldDefs": [ "ID" ] } }

    ],

  "qMeasures":

    [ { "qDef": { "qDef": "Sales/1000" } } ]

}

The number of rows this cube would yield would be equal to the number of IDs, as there is exactly one timestamp associated with each ID. Also, in this case it clearly makes no sense to return rows that show the sales for nonexistent combinations of "Timestamp" and "ID". In fact, if we did, then we would soon run into trouble. Imagine that there are 10 IDs. The resulting table would in that case contain 10*10=100 rows (if we assume that the timestamps are unique as well). If there were 1000 IDs then there would be a million rows. And if there were a million IDs then you'd better dive for cover and hope you don't bring down the entire engine when you try to compute the layout! Because the resulting number of rows would be astronomical!

It's good to be aware of this engine behavior as there are in fact cases where you risk creating such huge cubes. I have accidentally done so myself on several occasions. And I can tell you, it's not a good thing... The problem occurs when you add two fields to a cube that have no association to each other what so ever. A company could for instance have data for both sales and purchases, both of which have unique ID's, but are stored in different tables. Then I create a table visualization with timestamp and ID, but accidentally choose the ID from the "Purchases" table instead of from the "Sales" table like this:

"qHyperCubeDef": {

  "qDimensions":

    [ { "qDef": { "qFieldDefs": [ "Sales.Timestamp"  ] } },

      { "qDef": { "qFieldDefs": [ "Purchases.ID" ] } }

    ],

  "qMeasures":

    [ { "qDef": { "qDef": "Sales.Sales/1000" } } ]

}

Since all possible combinations of "Sales.Timestamp" and "Purchases.ID" are legal, this is likely to give me a hypercube of a rather intimidating size.

Summary

Hypercubes with multiple dimensions is a very common thing in Qlik Sense. Apart from the obvious example of a table visualization, most other visualizations allow for some form of multiple dimension configuration as well. Two examples that I frequently use are the bar chart and line chart visualizations, both of which will have cubes with multiple dimensions under certain configurations.

The handling of hypercubes is at the very core of how the engine deals with computations, but there are of course limits to what the engine can handle. If you run into surprisingly long computation times or very high engine memory usage, then be on the lookout for visualizations that accidentally span very large sets of field value combinations. Such cubes can sometimes give you nasty surprises.

Dissecting the Engine API - Part 5: Multiple Hypercube Dimensions