Data Model Performance

Jagsfan82 · ‎2020-01-22

I was hoping to get some feedback on a data solution that will have ~50 million rows and 60 or so fields in total. Basically looking at advantages between Star Schema (maybe with a snowflake or two) and keeping as much as possible in one big Fact table. I attached a spreadsheet to hopefully make the questions more clear.

Tab “Long vs Wide”

This is less of a question and more of a confirmation, as I think this is more straightforward. Essentially we take a long table and turn it into a wide table in order to perform some calculations in the load script. I would assume this helps significantly if the “original calculation” noted below is very commonly used.
- Type D = Type A + Type B
- Original Calculation - Sum( {Type=A,B} Measure1)
- New Calculation – Sum(Meaure1_D)

Tab “Dim Tables”

Question here would mostly be around performance of splitting out dimension tables or adding on to the FACT table. As I understand it, if memory wasn’t an issue, one giant table with all fields would work best from a processing speed perspective. I think a dimension that has 3 or more levels of text may have to be split out with star schema for memory issues, and that makes sense. Question I would have is dimensions that aren’t filtered or referenced in expressions often, and only one or two levels (e.g. Code and Description) with a low degree of uniqueness? Does splitting this out help anything, or is Qlik already sort of doing this on the back end anyway?
Essentially it seems like the example that I provided is manually recreating a similar process to how the Qlik engine is creating the Data Tables and Symbol tables in the back end. So does it make more sense (assuming front-end performance is an issue and load script performance is not) to keep as much as possible in the fact table and let Qlik handle the compression on its own?

Thanks for any assistance you can provide. Let me know if I can be more clear on either of these.

Brett_Bleess · ‎2020-02-19

Best I have is the following Design Blog post, but I am pretty sure there are also some prior posts related to this too, so if you did not search the Community, would recommend doing so, as you should be able to find some of those prior posts too.

https://community.qlik.com/t5/Qlik-Design-Blog/Data-Modelling-Clarity-vs-Speed/ba-p/1473644

Here is the base Design Blog URL in case you want to search further there:

https://community.qlik.com/t5/Qlik-Design-Blog/bg-p/qlik-design-blog

Scalability area is another good place to search:

https://community.qlik.com/t5/Qlik-Scalability/gp-p/qlik-scalability

And lastly, may find something out on Rob W's Cookbook pages too:

https://qlikviewcookbook.com/

Sorry I do not have anything better, you likely need to attach a QVW with data model etc. in order to get further input from others, trying to do this ahead of time can be challenging, better to put things together and then post that when you have it if something is slow etc., so folks can see what you have done with the data model and expressions etc. to comment on areas you may be able to improve things.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.

Jagsfan82 · ‎2020-02-20

Thanks for the pointers! I'm pretty sure it makes little sense to try to do Qlik's job for it and expect better results, but I'll look into those links and see if I can get further clarification.

Performance

Related Topics