Database Design for Qlik Analytics consumption

Leigh_Kennedy · Jun 23, 2024 10:19:59 PM

Qlik Cloud Analytics has a very powerful and easy to use data loading and transformation framework. This includes no-code data flows and Qlik scripting for advanced transformations. However, there are times when a customer may prefer to handle these transformations as much as possible outside of Qlik Analytics. Maybe the data is being used by Multiple Qlik Analytics environments, in a multi-tenant or hybrid architecture. Or maybe the data set needs to be consumed by other tools in addition to Qlik Analytics. You may also be looking to provide ready to use tables for Self-service analytics. In these situations, it makes sense to move any logic earlier in the lifecycle to avoid duplication and so all tools can benefit from the transformations.

The aim of this guide is to provide some recommendations on how to model your database for minimal transformations, or transformation-free consumption by Qlik Analytics. We cover our top ten recommendations, that if followed reduce, if not eliminate the need for extensive transformations within your Qlik analytics applications. While some of these recommendations assume a database as the data source, most are applicable across all data sources.

There are many powerful tools in Qlik’s Data Integration portfolio that can be used to achieve this, such as Talend Studio, Qlik Talend Data Integration or Qlik Compose. Qlik Talend Data Integration can even be used to replicate tables to QVDs in Qlik Cloud using Change Data Capture (CDC). For more information on these solutions, see Data Integration and Quality. However, this document will focus on the end goals, rather than the tools used to achieve this.

1. Use RDBMS views

I am going to start with what is a somewhat controversial, but I believe necessary step for many organisations that wish to minimise downstream transformations. Views in an RDBMS allow use to encapsulate all the other changes we will need to make into a virtual table than we can simply read from our Qlik Analytics app. There are a few assumptions behind this approach though that if not true may render this unnecessary or undesirable. These assumptions are:

The data resides in an RDBMS. Obviously if the data source is a flat file, rest connection or other Non-RDBMS source this is not an option.
An ETL/ELT tool is not desirable or not an option. Some organisations may prefer to use a tool to physically transform the data. In this case views would not be needed.
SQL skills are stronger and/or more readily available than Qlik scripting in your organisation. If you don't have the skills to do this, then it may make more sense to do this in Qlik.

2. Exclude columns not relevant for analytics

Often when building a data warehouse, data lake or Lakehouse we tend to include all possible columns and data so it is available at a later data to meet changing requirements. However pulling this data into a Qlik Sense app is not recommended unless there is a use for it. This is because Qlik Analytics uses a very powerful in-memory associative model to manage your data and bringing unused data into memory will have a performance impact.

3. Prefix standardized columns with the table name (e.g. last-modified)

It is quite common for tables in databases to include standardised columns. For example: CreatedBy, CreatedDate, etc. These columns are useful for organisational governance and audit requirements, but will cause problems in Qlik Analytics as Qlik will associate tables based on the identically named columns. Assuming you can not exclude the columns, renaming them e.g. myTable_CreatedBy, myTable_CreatedDate, etc. will avoid this issue.

4. Use unique single column primary keys where possible to avoid synthetic keys.

Qlik does not support composite (multi-column) keys in the same way databases do. When Qlik encounters a multi-column key it creates a synthetic key based on all the possible values for the keys. Synthetic keys use extra memory and can impact performance, as well as increasing the complexity of the analytics application. They are often not required and are due to factors such as standardised names (see #3). If there is not a natural key, you may be better to use an Autonumber or guid function to create a unique key. See Synthetic keys for more information.

5. Use the same naming convention for primary & foreign keys

Closely related to #4, Qlik expects primary and foreign key columns to have the same name. So if your primary key is person.id and your foreign key is role.person-id they will not join by default. The solution is to rename person.id to person.person-id.

6. De-normalise before if possible

There are no benefits from a highly normalised data model in Qlik. Qlik stores values in symbol tables so there isn't much extra data storage required for denormalised tables and extra joins impact performance. Therefore denormalising before it reaches Qlik removes the complexity of having to do this afterwards.

7. Use Star schemas if possible, snowflake if not

Closely related to #6, Qlik Analytics performs best against a Star Schema, and well against a snowflake schema. This is also true of many other query tools so transforming into one of these Schema styles in your database will provide value across your data landscape. These transformations from a normalised schema can sometimes be easier to make in SQL outside of Qlik Analytics than if they need to be hand-coded in Qlik Script. See star and snow & When do we use Snow Flake schema in Data Model?. And as you ideally only want a single fact table, you may want to consider concatenated fact tables (see Concatenate vs Link Table ).

8. Filter data in the SQL statement (or before) where possible

when loading data from a source database, It is common to filter the data to provide just what is needed for your use-case. It is possible to filter any data source, however if this is done within a Qlik Analytics load statement, the data will first be read by Qlik before excluding that which does not meet the filter. This increases load times, network bandwidth and resource usage. If however, the filter is placed on a select statement, then this will be pushed down to the source and data excluded by the filter will never be sent to Qlik. In the following example, the end result is the same. In the former however the filtering is done once the data has been read by Qlik. In the latter, it is pushed down to the database and is never seen by Qlik.

If this filter is static, it would make sense to include it as part of the view to ensure self-service users do not load unnecessary data.

9. Only keep the level of precision you need

Analytics rarely involves drilling down to the individual transactions. Therefore often less precision is needed for analytics as is needed for an operational system. Rounding down (say from 8 to 4, or even two decimal places) saves space, reduces data transferred and improves performance. If you are doing this it makes sense to do it before it gets to Qlik and this could be embedded in a view. For example if I have a ‘rate’ field that allows 6 decimal points, but IO only need 2, I could use a statement like “cast(rate as NUMERIC(6, 2) )”. Doing this before it gets to Qlik will provide better consistency across your Qlik Landscape and provides better performance to your self-service users.

10. Plan for incremental loads if possible

Remember when we said to exclude columns not relevant for analytics (#2) ? Well there is an exception to this and it relates to the need to understand when data was changed so you can perform incremental loads. Storing the last modified date allows us to only look at the data changed since our last reload, so will make a big difference to performance for incremental loads.

Final thoughts

The suggestions on this list are not the only things that you can do to make your source database Qlik analytics ready however they deserve consideration as they can provide a lot of value. And this is not the only approach you can take – there may be good reasons for for taking a different approach in some of these areas, however at least considering these facors will help you to build an analytics ready database.

What other suggestions do you have? What has your organisation done to create a Qlik Analytics ready environment? Let me know in the comments!

QFabian

Thanks @Leigh_Kennedy for this article, very usefull

Database Design for Qlik Analytics consumption