Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Data Model - To Join or Not to Join

I have the following data model. It is based on logged vehicle data readouts.  Most everything is linked together by the field READING_ID. In the source database, the readings are in one big stacked table (i.e. each READING_ID has thousands of records). I have setup processes to extract readings and split them apart because each reading has a lot of different types of information that need to be processed differently to work in this QlikView app.

DataModel.png

The Readings table has 1 record per READING_ID. But 3 of the other tables (Ranking.RIDs, IUPR, and CALID_CVN) also contain 1 record per READING_ID. So, in theory, I could join them all together and not make the Readings table any longer, but make it much wider.

I like my original data model because it's easy for me to understand and track as I am adding more tables by extracting different data points from the readings.


My question is, would there likely be any performance gain in the application by joining everything together to make 1 big (wide) Readings table? Right now I am working on a small test set of data but in the production version this will be millions - 10s of millions of records.

rwunderlich‌,hic‌‌ I would highly value your inputs.

Of course this is open to the whole community so I welcome your feedback also.

1 Solution

Accepted Solutions
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Casey,

I'm neither Rob nor HIC, but let me add my $0.02...

At 10s of millions of rows, performance becomes increasingly important, so your question is very much appropriate. Unfortunately, in your case, you have a number of conflicting factors, so I'd suggest that you test any hypothesis prior to accepting it. Here are a few thoughts:

- QlikView works better with one large table, than with several large tables. From this perspective, four large tables should conceptually work slower than one large table. So, JOIN.

- On the other hand, QlikView doesn't work too well with extremely wide tables, because it needs to access the whole row of data, even if a certain calculation requires only 2-3 fields. So, from this standpoint, DON'T JOIN.

- At the same time, chart calculations are increasingly slower when fields from different tables are being aggregated together:   SUM(A.F1 * B.F2) is very slow if the two fields are stored in separate tables. So, if you have any calculations that involve fields from multiple tables, then JOIN.

- This suggestion is a "no-brainer" - it's hard to imagine than ALL of your fields participate in the analysis. Quite possibly, many of these fields were loaded "just in case", but are not being actively used. If you analyze your application with Rob's Document Analyzer, you will find what fields are used or not used, and what fields contribute to the highest memory consumption. That will help you narrow your data down to the necessary minimum and also understand the various factors that drive performance of your app.

We teach these principles of good and bad performance at the Masters Summit for Qlik. Join us in Boston this October for a more detailed overview of QlikView performance factors.

cheers,

Oleg Troyansky

View solution in original post

1 Reply
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Casey,

I'm neither Rob nor HIC, but let me add my $0.02...

At 10s of millions of rows, performance becomes increasingly important, so your question is very much appropriate. Unfortunately, in your case, you have a number of conflicting factors, so I'd suggest that you test any hypothesis prior to accepting it. Here are a few thoughts:

- QlikView works better with one large table, than with several large tables. From this perspective, four large tables should conceptually work slower than one large table. So, JOIN.

- On the other hand, QlikView doesn't work too well with extremely wide tables, because it needs to access the whole row of data, even if a certain calculation requires only 2-3 fields. So, from this standpoint, DON'T JOIN.

- At the same time, chart calculations are increasingly slower when fields from different tables are being aggregated together:   SUM(A.F1 * B.F2) is very slow if the two fields are stored in separate tables. So, if you have any calculations that involve fields from multiple tables, then JOIN.

- This suggestion is a "no-brainer" - it's hard to imagine than ALL of your fields participate in the analysis. Quite possibly, many of these fields were loaded "just in case", but are not being actively used. If you analyze your application with Rob's Document Analyzer, you will find what fields are used or not used, and what fields contribute to the highest memory consumption. That will help you narrow your data down to the necessary minimum and also understand the various factors that drive performance of your app.

We teach these principles of good and bad performance at the Masters Summit for Qlik. Join us in Boston this October for a more detailed overview of QlikView performance factors.

cheers,

Oleg Troyansky