Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
oddgeir
Contributor III
Contributor III

What is better, repeated calculations or iterations with join - or something else?

What is the most efficient way to create dimensions in load script when calculations are almost similar?

Here's an example

 

 

	monthlyWorkdays				//Active workdays pr month	
    	*	expected_workday		//Active work hours a day
        *	vacationfactor			//For typical vacation months                        
        *	FTE			        //Adjustment pr worker
     	*	Rate 				//Cost pr hour
    	*	Capex				//Expected work type

 

 

This is intended to calculate how much a particular worker will cost for Capex type work in a period. 

However I may ALSO want to calculate the full cost, how many hours is expected, maybe even the basic number of work hours a month. 

Then I see two ways to do this (at least)

1) Replicate the calculation with relevant factors for each dimension
2) Build from ground up, by doing the most basic calculation, and then iterate with Left join to use this value for adding further calculations for the next dimensions 

I don't know the inner secrets of QlikSense so I'm just wondering if one method is more efficient than the other. On one hand calculations with already loaded data could be quicker than reloading. On the other hand simply adding the next calculation to the line ID could be just as easy if loading resident content isn't a hassle at all. 

I'm thankful to any information which can help me in choosing the approach that gives most efficient load in bigger scales. 

 

Labels (1)
2 Replies
edwin
Master II
Master II

wouldnt you be able to claculate all of these using one load?  if not then buld your first pass of the table with all the claculations you can with the first load - that is if all the fields you need are readily available.  next left join to it all the fields you need to calculate all the other computed fields.  then create and load from this base table to calculate all your expressions.  thats one way

there will be multiple factors to consider as well.  for example if your files are huge and in QVD format, you may want to calculate after youve loaded your QVD.  calculating at load will slow down the QVD file load.

other option is to calculate anything you can on ETL side and not in Qlik side.  databases are designed to manipulate huge data tables quickly and if you have an etl tool then that will help speed the whole process.

of course creating different tables with sperate expressions and then joinging them to your base table will work as well

you need to test different strategies to see which one is fastest/most understandable for you.

oddgeir
Contributor III
Contributor III
Author

Thanks for the input, even if it didn't give me a clear answer on what was the best approach. 

Yes I can calculate all in one load, by doing

 

 

 

factor1*factor2                                as dimension2
factor1*factor2*factor3                        as dimension3
factor1*factor2*factor3*factor4                as dimension4
[...]
factor1*factor2*factor3*factor4[...]*factorN   as dimensionN

 

 

 

However my question is if all the potential calculations of factor1*factor2 takes more effort than 

 

 

 

Table:
factor1*factor2                                as dimension2

Left join (Table)
dimension2*factor3                             as dimension3

Left join (Table)
dimension3*factor4                             as dimension4
[...]
Left join (Table)
dimensionN-1*factorN                           as dimensionN

 

 

...where factor1*factor2 is only calculated once (pr row), but there's additional effort in resident reload of Table. 
So if reload is no hassle the calculation is only done once instead of N times.
If reload is significant effort there's no benefit in reloading N times to avoid calculating factor1*factor2 N times. 

So basically, it's a question of what takes most effort in total, resident load or calculation.