Impact of number of fields in qlik - how does it e... - Qlik Community

wizardo · ‎2018-11-04

Hi,

I know more fields in the dataset takes more memory, just for the fact that it's more data, but given the same amount of data, what is the impact of arranging the data model one way or the other (linkTable, Concatenated fact, star and snowflake etc)

if I have a fact table with 10 fields, connected (one to one) with a dimension table with 10 fields, will there be a difference if I'll load two tables linked with 10 fields each or one joined table with 20 fields

and if it's 100 fields?... or 500?... and more fact tables and dimensions?

what will be more efficient and how exactly (or not exactly ) does it calculate.

does anyone know or have some light to shade?

mind you I'm not looking for solutions to optimize, I know if I change the data model this way or that..... I can achieve less memory/cpu consumption, but that's not what I'm after.

EDIT-inserted after @petter-s remark

-Thank you, Petter, for your remark, as I phrased it... you are absolutely correct. so let me try and clarify.

-first of all, I'm interested in this from an intellectual point, meaning I like knowing how the product works internally and understand its intricacies.

saying that...the end goal is (as you suggest) optimization...how to better use the product to include more data and get better performance. just...to do it with the same data without removing fields(i have already used rwunderlich‌'s wonderful Document analyzer and eliminated unused/necessary fields).

another thing..... I have knowledge of various data structures commonly and uncommonly used in building data models, as I mentioned above, but I feel I lack some knowledge in the area represented by my post, so I went to the best expert I know "Qlik Community".

-END EDIT

I am talking about the case where I have a certain amount of fields and it should be the same number, no matter how I arrange the data model.

Thanks

Daniel

petter · ‎2018-11-05

Sorry - I have a problem understanding why you ask the question if you are not interested in optimizing anything as you state? Then it is just a waste of time if you will not at some point use what you will understand to make better (optimized) solutions. Maybe it's just me having broader definition of what optimization is than you - I don't know.

wizardo · ‎2018-11-05

Hi Petter and Thank you,

I have inserted an edit in the original post, hope it supplies sufficient information to answer your question.

I would appreciate any input you might add.

Thanks

Daniel Chotzen

swuehl · ‎2018-11-05

Have a look at

Symbol Tables and Bit-Stuffed Pointers

That's fundamental to understand how Qlik works internally.

Miguel_Angel_Baeyens · ‎2018-11-05

As Peter has pointed out above, there are several factors to consider when it comes to performance of QlikView, server and application, but first and foremost, avoiding explicitly to reply to your question, the real question is: what do you have now and what do you want to do with that?

Assuming there is no perfect data model (you always have this step that you cannot do in the source and it's too heavy to do in the expressions, so you end up doing it in the script) and that you will likely have tight requirements from business (like response times, reload times, types of objects you have to display on screen, etc.) there are several good answers, even if they are not the "best performing" of all possible answers.

There do are, however, some rules of thumb which have been proven over time and by the experience of the people you mention and also others (please, check the group Qlik Scalability‌ for all things performance for Qlik). Some of these rules are well known: use numeric values to link tables as much as possible, even if you use literal values on screen (e.g: dates), use more columns instead of more rows in the data model of Qlik, use set analysis, JOIN tables, etc. There are other improvements which depend completely on your data model.

I will strongly recommend you to get familiar with the Qlik Scalability Tools, they provide good inputs on how the same application performs under different circumstances, and lets you simulate load (users and clicks) to see how your app performs, from the user perspective.

And adding to Stefan's post above, the entire Qlik Design Blog is a must if you want to know the intricacies of the product from a broader, more educational point of view, especially those written by Henric Cronström who had several series focusing on exactly how the product works (bit stuffed pointers, the QIX engine, set analysis, JOIN, concatenations, null values, etc.).

petter · ‎2018-11-05

The keyword for Qlik data models is cardinality of each field. Do your very best to avoid many high-cardinality fields as they consume much more memory since Qlik's built-in compression techniques won't be very effective. Look specifically at what Stefan swuehl posted below to understand the compressions techniques with Symbol Tables and Bit-Stuffed Pointers and how cardinality-"control" is crucial. For example a timestamp should be rounded up to the highest granularity you actually need. Don't keep them as-is when loading them from any source. Round/truncate them to the time precision you need - for instance every 5 minutes or whatever your analysis needs require.

rwunderlich · ‎2018-11-05

It's a rather complex question,as the performance of the model depends on the expressions you use. As others have pointed out, posts by hic‌ on the Qlik Design Blog are the best place to get an in-depth understanding of how things work under the covers. Here are some generalized answers to your question.

1. In general, the most efficient (calculation time) data model is a single table, although this is usually not the most efficient for RAM consumption.

2. A model that uses more RAM requirements can have an impact on chart calculation time. This is a machine and Windows architecture issue rather than a QV software limitation.

3. The number of hops or links between tables may have a negative impact on chart calculation time. This is why a star schema is generally favored over a snowflake schema.

4. A significant inefficiency comes about when the fields used in an expression are in two or more tables. For example, "Sum(Quantity * UnitPrice)" where those fields are in different tables.

Returning to your original question. If the Dimension table contains fields that are used only as chart Dimensions, I would leave it as a separate table. This would optimize the RAM usage at no cost to calculation.

If any of the fields in the DImension table are used in expressions, I would move those fields to the fact table.

My overarching rule is: First, code for clarity. Second, optimize only when necessary. My primary design criteria for a data model is make it easy to understand and use, using minimal effort to create and maintain it. I only think about getting clever or sophisticated when I have a specific problem to solve.

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

http://www.easyqlik.com

Impact of number of fields in qlik - how does it effect memory and CPU