Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi.
I Have come to a crossroad in two of my applications and I am lacking some knowledge about the Qlik specifications to help me decide how to continue.
Case No. 1 - Application with 36M Records. I switched the application from star schema to link table. The app metadata analyzer application provided by Qlik showed that this change saved about 2GB(!) of memory BUT there is a decrease in performance, Especially in pivot tables. My question is: Is there a rule (or at least a rule of thumb) about the data volume and what design to use?
Case No. 2 - An application with 3 fact table, all very different in structure. The Orders table has a 1: N Relations with Mat_Mov table and Mat_Mov has 1: N Relations with the HilanHours Table. There is also a need to have two separate calendars. One is derived from Orders and the other from Mat_Mov. Base on that (And to save memory consumption), I have decided not to use any of the standard designs and to link the fact tables in a "logical line" - See Design attached. Since the data volume is low (about 200K of rows) the performance is excellent but, again, is there a rule of data volume that will cause the performances to deteriorate?
Regards,
Motty
NULL itself isn't a problem because it's not a value and won't be stored in any way - nowhere. That concatenated fact-tables are in some degree asynchron and all not available field-values are later treated as NULL might be not perfect but usually it worked very smoothly.
If there are in the end much more different fields as common ones means that there is potential to optimize the table-structures so that the rate of different to common fields is reverted so that only a few fields are different or maybe even all sides are the same - means no qualifying of fields in any way, no crosstable-structures, unifying fields, outsourcing of fields into dimension-tables and so on.
Very important is that all relevant fields which are used in an object in the UI come from one table because Qlik creates for each UI object a virtual table fetching all included fields to create the dimensionally context for the aggregations which are there performed. The creation of those virtual tables is most heavy work and AFAIK until nowadays a purely single-threaded process - the further aggregations are then nearly all fully multi-threaded.
Therefore the suggestion to keep the number of tables small (not mandatory a single flat-table or applying a star-scheme but the bigger the dataset and/or the more complex the calculations/views the more advantages creates the de-normalization) and associate them properly and at least move the essential measures/fields to one table.
- Marcus
The official recommendation from Qlik for a data-model is to develop it in the direction of a star-scheme because it's usually the best compromise in regard to load runtimes, memory consumption, complexity, maintainability and UI performances.
Whereby the UI performances is often the most important factor for the (end) users and also for the companies. The reasons are quite clear because storage and RAM are cheap (maybe not on the first glance by purchasing it but compared to extra waiting-times for n users and the wanted user-experience) and load runtimes could be easily controlled by appropriate incremental logics and dividing them on multiple time-frames. Often neglected are the parts complexity and maintainability which could be very expensive if requirements couldn't be solved in a rather short time and/or they need any external support - therefore keep it as simple as possible which mostly results in a star-scheme.
This means choosing any other data-model results rather seldom in benefits - rather in specific scenarios (single applications) but not in a general approach (especially not by developing/maintaining and administration a whole environment). Nearly all alternative methods to a star-scheme requires more expertise, work and time to get to work. Further the bigger the datasets are and the larger the amount of users and applications the more important becomes the point to apply to all processes the most similar measures and here again it's more likely that you could reach that with a star-scheme logic.
From a theoretical point of view a more normalized data-model needs less RAM and therefore it may look as it should be also true for a link-table model. But in my experience it's often not the case because the link-table could become much larger as the fact-tables and with it the size of the needed pointer. This leads to the most important fact in regard to the needed RAM and this is the cardinality of the fields - a very good explanation is the following post: The Importance Of Being Distinct - Qlik Community - 1466796. If the RAM consumption from your application is now significantly decreased I could imagine that's rather caused from such distinct-fieldvalues changes as from the switch of the datamodel.
- Marcus
Hi Marcus.
First, thx for you replay.
When it comes to the complexity of Star vs. link... well, I find the link much easier to maintain then star...but this is me.🙂
As for the memory reduction in case 1 –I guess it came from distinctive values and the fact that concatenating 2 tables, one with 30M rows (and 200 col) and the other with 6M (and 80 col) with almost no fields in common, meant millions of cells with null - and (as far as I know) Null, in Qlik, does not have a fixed length (Hence a fixed memory consumption). I could see that Some of the pivot tables do work faster – those presenting data explicit to one of the fact tables, and the memory reduction is quite impressive considering the fact that a link table with millions of records was added, but the App is generally slower (BTW - The QVF itself did grow in size but that was expected).
Now, since there is probably no rule of thumb for memory, is it possible that with the new design, the memory saved was saved on the expanse of CPU Usage? Theoretically, The CPU has to "work" much harder now, I guess it is because the system now must calculate through more tables but is there a rule of thumb?
Regards.
Motty
NULL itself isn't a problem because it's not a value and won't be stored in any way - nowhere. That concatenated fact-tables are in some degree asynchron and all not available field-values are later treated as NULL might be not perfect but usually it worked very smoothly.
If there are in the end much more different fields as common ones means that there is potential to optimize the table-structures so that the rate of different to common fields is reverted so that only a few fields are different or maybe even all sides are the same - means no qualifying of fields in any way, no crosstable-structures, unifying fields, outsourcing of fields into dimension-tables and so on.
Very important is that all relevant fields which are used in an object in the UI come from one table because Qlik creates for each UI object a virtual table fetching all included fields to create the dimensionally context for the aggregations which are there performed. The creation of those virtual tables is most heavy work and AFAIK until nowadays a purely single-threaded process - the further aggregations are then nearly all fully multi-threaded.
Therefore the suggestion to keep the number of tables small (not mandatory a single flat-table or applying a star-scheme but the bigger the dataset and/or the more complex the calculations/views the more advantages creates the de-normalization) and associate them properly and at least move the essential measures/fields to one table.
- Marcus
One more question:
How do Section access tables affect all of the above?
Section access is performed by opening the application and worked like a selection within the UI whereby it doesn't write true/false within the state-tables else it removed the excluded data from the datamodel and from the RAM. This means it creates a certain overhead but only during the opening - afterwards there no further effect unless that the datamodel is now more ore less shrinked and all calculations should be faster as without section access (unless the section access removed any relevant fields/fieldvalues so that certain calculations becomes invalid and/or may try to calculate cartesian products).
- Marcus
Thx🙂👍