Solved: Data Model - To Join or Not to Join - Qlik Community

Anonymous · ‎2017-08-15

I have the following data model. It is based on logged vehicle data readouts. Most everything is linked together by the field READING_ID. In the source database, the readings are in one big stacked table (i.e. each READING_ID has thousands of records). I have setup processes to extract readings and split them apart because each reading has a lot of different types of information that need to be processed differently to work in this QlikView app.

The Readings table has 1 record per READING_ID. But 3 of the other tables (Ranking.RIDs, IUPR, and CALID_CVN) also contain 1 record per READING_ID. So, in theory, I could join them all together and not make the Readings table any longer, but make it much wider.

I like my original data model because it's easy for me to understand and track as I am adding more tables by extracting different data points from the readings.

My question is, would there likely be any performance gain in the application by joining everything together to make 1 big (wide) Readings table? Right now I am working on a small test set of data but in the production version this will be millions - 10s of millions of records.

rwunderlich‌,hic‌‌ I would highly value your inputs.

Of course this is open to the whole community so I welcome your feedback also.

Oleg_Troyansky · ‎2017-08-15

Hi Casey,

I'm neither Rob nor HIC, but let me add my $0.02...

At 10s of millions of rows, performance becomes increasingly important, so your question is very much appropriate. Unfortunately, in your case, you have a number of conflicting factors, so I'd suggest that you test any hypothesis prior to accepting it. Here are a few thoughts:

- QlikView works better with one large table, than with several large tables. From this perspective, four large tables should conceptually work slower than one large table. So, JOIN.

- On the other hand, QlikView doesn't work too well with extremely wide tables, because it needs to access the whole row of data, even if a certain calculation requires only 2-3 fields. So, from this standpoint, DON'T JOIN.

- At the same time, chart calculations are increasingly slower when fields from different tables are being aggregated together: SUM(A.F1 * B.F2) is very slow if the two fields are stored in separate tables. So, if you have any calculations that involve fields from multiple tables, then JOIN.

- This suggestion is a "no-brainer" - it's hard to imagine than ALL of your fields participate in the analysis. Quite possibly, many of these fields were loaded "just in case", but are not being actively used. If you analyze your application with Rob's Document Analyzer, you will find what fields are used or not used, and what fields contribute to the highest memory consumption. That will help you narrow your data down to the necessary minimum and also understand the various factors that drive performance of your app.

We teach these principles of good and bad performance at the Masters Summit for Qlik. Join us in Boston this October for a more detailed overview of QlikView performance factors.

cheers,

Oleg Troyansky

Oleg Troyansky
Call me for expert support, performance tuning, advanced training and more!

View solution in original post

Oleg_Troyansky · ‎2017-08-15

Hi Casey,

I'm neither Rob nor HIC, but let me add my $0.02...

At 10s of millions of rows, performance becomes increasingly important, so your question is very much appropriate. Unfortunately, in your case, you have a number of conflicting factors, so I'd suggest that you test any hypothesis prior to accepting it. Here are a few thoughts:

- QlikView works better with one large table, than with several large tables. From this perspective, four large tables should conceptually work slower than one large table. So, JOIN.

- On the other hand, QlikView doesn't work too well with extremely wide tables, because it needs to access the whole row of data, even if a certain calculation requires only 2-3 fields. So, from this standpoint, DON'T JOIN.

- At the same time, chart calculations are increasingly slower when fields from different tables are being aggregated together: SUM(A.F1 * B.F2) is very slow if the two fields are stored in separate tables. So, if you have any calculations that involve fields from multiple tables, then JOIN.

- This suggestion is a "no-brainer" - it's hard to imagine than ALL of your fields participate in the analysis. Quite possibly, many of these fields were loaded "just in case", but are not being actively used. If you analyze your application with Rob's Document Analyzer, you will find what fields are used or not used, and what fields contribute to the highest memory consumption. That will help you narrow your data down to the necessary minimum and also understand the various factors that drive performance of your app.

We teach these principles of good and bad performance at the Masters Summit for Qlik. Join us in Boston this October for a more detailed overview of QlikView performance factors.

cheers,

Oleg Troyansky

Oleg Troyansky
Call me for expert support, performance tuning, advanced training and more!