Qlik Community

Ask a Question

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Henric_Cronström

 

Databases are usually not very forgiving.

Strict rules apply, defining what’s allowed and what’s not. For example, you are not allowed to enter data unless it has the right data type and is formatted the right way. Further, you are often not allowed to enter a value for a foreign key unless this value already exists in the master table. And you are not allowed to enter the same value twice if the field is a primary key.

The reason is of course to ensure data integrity. Without such rules, the database would soon be cluttered with bad quality data and contain a large number of errors.

The fact is that a good system is one that has a large number of rules, but at the same time is easy to use: Equipped with a user interface designed in a way so that the user doesn’t notice the rules – or at least isn’t disturbed by them.

But with the Qlik engine it is a very different situation.

QlikView and Qlik Sense should not make sure that the data is free from errors. Instead, they should do exactly the opposite: Display the source data along with all its errors. This requirement is totally different from the demands you have on a database, and as a result the Qlik engine is built in a different way:

 

No Data Types.pngData Types

There are no data types in the Qlik engine. The reason is simple: You may have data from different tables or even from different data sources in one single field. Then there is a potential risk that you have different data types in the different sources.

When loaded, all fields are converted into duals (number and text, or just text), and so one field can contain data that originally had different types.

 

Mixed Formats.pngFormatting

A single field can have a mixed data format. Also here, the reason is simple: Different sources may have different formats. As a result, it doesn’t matter if a date is formatted as 3/31/16, 2016-03-31 or 42460. They will all three represent March 31, 2016.

Each distinct field value has its own format, and a single field may thus be displayed with different formats.

 

No Referential Integrity.pngReferential integrity

The Qlik engine does not enforce referential integrity. For example: You may have a customer ID in your fact table that does not exist in the customer table (which would be an error in the data integrity of the database). But the Qlik engine will accept this and show NULL as customer name.

 

Assume Many To Many.pngRelationship type

Often you know if you have a many-to-one or a many-to-many relationship between two entities. But this information is not loaded from the database. Instead the Qlik engine assumes worst case and is always prepared for a many-to-many relationship.

Links between tables don’t carry information about relationship type. And all calculations involve aggregations, since there is a possibility for multiple values of the referenced field.

The bottom line is that the Qlik engine is a very forgiving engine. It handles errors in all of the above cases gracefully. No matter how many such errors you have in the data, the Qlik engine will always make a best-effort attempt in evaluating and showing the loaded data.

HIC

 

Further reading related to this topic:

Data Types in QlikView

Automatic Number Interpretation

It’s all Aggregations

8 Comments
swuehl
MVP
MVP

Henric,

that's a great introduction into the differences between Qlik and common data bases!

Regarding the forgiving engine, there is at least one place where the engine could be a little more forgiving:

Searches in field values. You probably know best that numeric searches in dual fields require the textual value as search pattern most of the time. That seems to be an ongoning issue for many users of Qlik, when looking at the amount of threads here in the forum. I think someone even made a documentation about the hateful date recently.


Besides that, I love the flexibility of the engine which really makes it easy and fast to build a data model and start looking at the information hidden in the data.

I would also love to see a follow up blog post related to the differences between old (11.20) and new (12.00 / QS) engine.

Best regards,

Stefan

1,308 Views
rbecher
Luminary Alumni
Luminary Alumni

Dear Henric,

thanks again for a great explaining post. I would like to add a Qlik specifica into the Formatting paragraph. Although you can have different formattings for a numeric or date value you cannot have different formattings for the same occuring value in a field. Let's take March 31st and try to load 3 different formattings. It will not work. The first formatting will win. Which I think is an architectural problem:

Test:

LOAD Date(42460) as Date AutoGenerate(1);

LOAD Date(42460, 'D/M/YY') as Date AutoGenerate(1);

LOAD Date(42460, 'YYYY-MM-DD') as Date AutoGenerate(1);

Of course, there would be a workaround to store formattings in a separate text field. But I think most Qlik users are not aware of this..

Best regards,

Ralf

1,308 Views
barryharmsen
Luminary Alumni
Luminary Alumni

Hi Henric,

Completely agree with your assessment that the QIX engine is very forgiving, but just to play devil's advocate for a bit:

  • Just because you have the option of using referential integrity in a database does not mean you need to use it. Typical Data Warehouses and Data Marts do not enforce foreign key constraints (referential integrity) either. Instead that responsibility is delegated to the ETL process.

  • Some developers do turn on FK constraints as a 'safety net'. A foreign key constraint will not necessarily ensure good data, but it can catch errors in the ETL process. Errors/bad data quality is not only caused by poor source data, sometimes errors are also unknowingly introduced during the ETL process. I've seen plenty of scripts that introduced errors into data models that would've easily been caught had some sort of check been in place. (hence for example the assertions that I include in my data modeling session at the Masters Summit)

  • Foreign keys and other constraints also let us tell the database how we intend to use the data. This does not need to be very restrictive, and the query optimizer can use this information to produce faster running queries. Most of the time the QIX engine gives me ample performance, but wouldn't it be nice if we could squeeze out some extra performance by just telling the QIX engine a little bit more about our data model?

In our own line of work we are providing users with relevant information that will help them make better decisions. The same principle could apply to the QIX engine, providing it with more relevant information will help it make better decisions and improve performance. Of course, I understand that there is a trade-off versus flexibility/forgiveness, but having a choice would be preferable IMHO.

Kind regards,

Barry

0 Likes
1,308 Views
Henric_Cronström

Stefan

I absolutely agree that we should improve searches in date fields, and I promise that I will push for this. It should be as forgiving as possible in this area.

I'll think about the blog post. The risk is that there isn't enough to tell...

HIC

0 Likes
1,308 Views
Henric_Cronström

Ralf

You are absolutely right that it isn’t possible to have different formatting for different occurrences of the same field value. The reason is of course that the formatting is stored in the symbol table, and there each distinct value has exactly one record, not more. If the formatting were to be stored in the data table instead, you would bloat this table and the app would take much more space.

So I don’t see this as an architectural problem, but rather as consequence of the compact data storage model.

HIC

0 Likes
1,308 Views
Peter_Cammaert

I guess the entire article applies equally well to the old pre-12 QlikView engine. Performance differences aside, there aren't any mentions of behavioral changes in official Qix-related communications that I know of. Or did I miss something?

0 Likes
1,308 Views
eya
Employee
Employee

No Referential Integrity --> Full-Outer-Join --> NO DATA LEFT BEHIND

Many-to-Many --> No Chasm Trap --> NO DATA DOUBLE COUNTED

Business Impact: Correct numbers!

900 Views
Not applicable

Nice article.

0 Likes
900 Views