Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
What is the most efficient way to create dimensions in load script when calculations are almost similar?
Here's an example
monthlyWorkdays //Active workdays pr month
* expected_workday //Active work hours a day
* vacationfactor //For typical vacation months
* FTE //Adjustment pr worker
* Rate //Cost pr hour
* Capex //Expected work type
This is intended to calculate how much a particular worker will cost for Capex type work in a period.
However I may ALSO want to calculate the full cost, how many hours is expected, maybe even the basic number of work hours a month.
Then I see two ways to do this (at least)
1) Replicate the calculation with relevant factors for each dimension
2) Build from ground up, by doing the most basic calculation, and then iterate with Left join to use this value for adding further calculations for the next dimensions
I don't know the inner secrets of QlikSense so I'm just wondering if one method is more efficient than the other. On one hand calculations with already loaded data could be quicker than reloading. On the other hand simply adding the next calculation to the line ID could be just as easy if loading resident content isn't a hassle at all.
I'm thankful to any information which can help me in choosing the approach that gives most efficient load in bigger scales.
wouldnt you be able to claculate all of these using one load? if not then buld your first pass of the table with all the claculations you can with the first load - that is if all the fields you need are readily available. next left join to it all the fields you need to calculate all the other computed fields. then create and load from this base table to calculate all your expressions. thats one way
there will be multiple factors to consider as well. for example if your files are huge and in QVD format, you may want to calculate after youve loaded your QVD. calculating at load will slow down the QVD file load.
other option is to calculate anything you can on ETL side and not in Qlik side. databases are designed to manipulate huge data tables quickly and if you have an etl tool then that will help speed the whole process.
of course creating different tables with sperate expressions and then joinging them to your base table will work as well
you need to test different strategies to see which one is fastest/most understandable for you.
Thanks for the input, even if it didn't give me a clear answer on what was the best approach.
Yes I can calculate all in one load, by doing
factor1*factor2 as dimension2
factor1*factor2*factor3 as dimension3
factor1*factor2*factor3*factor4 as dimension4
[...]
factor1*factor2*factor3*factor4[...]*factorN as dimensionN
However my question is if all the potential calculations of factor1*factor2 takes more effort than
Table:
factor1*factor2 as dimension2
Left join (Table)
dimension2*factor3 as dimension3
Left join (Table)
dimension3*factor4 as dimension4
[...]
Left join (Table)
dimensionN-1*factorN as dimensionN
...where factor1*factor2 is only calculated once (pr row), but there's additional effort in resident reload of Table.
So if reload is no hassle the calculation is only done once instead of N times.
If reload is significant effort there's no benefit in reloading N times to avoid calculating factor1*factor2 N times.
So basically, it's a question of what takes most effort in total, resident load or calculation.