Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Which will give me best performance - using Boolean Alegbra or nested IFs (or CrossTable)?
Reading this article "On Boolean Fields and Functions" it looks like Boolean Algebra will be faster than multiple IF statements but Set Analysis might be even better(?).
I noticed that a colleague has created a Straight Table with one Dimension and one Expression. The table displays values from a dataset of 650,000 records, at Q1, with the use of nested IFs in the Expression. When he showed me the expression I was stunned (and impressed by QlikView) to see that he had 60 IF statements nested together, yet it displayed the results without "thinking about it". However, when he tried to add a second dimension QlikView really struggled.
I remember being taught, back in the 80's, that Boolean Algebra was faster than IF statements but as computers got faster and faster I never got to a point where I noticed a significant difference, until now(?).
So my question is, where should my colleague go from here? My first thought was to use the CrossTable function in his script as his data has the values that he needs to SUM in columns and he has 60 of these. However, I was worried that, as he has 650,000 records, he would end up with over 130 million records by year end. As I'm new to QlikView I have no idea if this many records would bother QlikView or not. Also, I don't want to suggest that he changes the structure of his data in this way if he has many other charts, etc using the data too. It was at this point that I wondered if he would be better off converting the 60 level nest IF statement into a boolean formula instead. Currently his structure is like this:
IF([Category]='Category A' then Sum([Cost CatA]),
IF([Category]='Category B' then Sum([Cost CatB]),
IF([Category]='Category C' then Sum([Cost CatC]),
IF([Category]='Category D' then Sum([Cost CatD]),
60 lines later...
IF([Category]='Category ZZ8' then Sum([Cost CatZZ8])))etc)))
I would like to suggest he changes this to:
( fabs([Category]='Category A')*Sum([Cost CatA]) ) + ( fabs([Category]='Category B')*Sum([Cost CatB]) ) + ( fabs([Category]='Category C')*Sum([Cost CatC]) ) + etc
but I don't know if he'll be any better off doing it this way or he should be using something else instead.
The structure of his data is:
Field1, Field2, Field3, Field4, Cost_CatA, Cost_CatB, Cost_CatC, etc
His Straight Table uses Field4 (Dimension) and all of the Cost columns (Expression) and we'd like to add fields 2 and 3 as dimensions.
Hope someone can point me in the right direction as to the best way forward.
Many thanks.
This is brilliant, thank you for your reply @cotiso_hanganu and thanks too to @john_witherspoon for providing the evidence as seen in your first link.
Regards