Solved: Which will give me best performance - using Boolea... - Qlik Community

lawrenceiow · ‎2013-12-20

Which will give me best performance - using Boolean Alegbra or nested IFs (or CrossTable)?

Reading this article "On Boolean Fields and Functions" it looks like Boolean Algebra will be faster than multiple IF statements but Set Analysis might be even better(?).

I noticed that a colleague has created a Straight Table with one Dimension and one Expression. The table displays values from a dataset of 650,000 records, at Q1, with the use of nested IFs in the Expression. When he showed me the expression I was stunned (and impressed by QlikView) to see that he had 60 IF statements nested together, yet it displayed the results without "thinking about it". However, when he tried to add a second dimension QlikView really struggled.

I remember being taught, back in the 80's, that Boolean Algebra was faster than IF statements but as computers got faster and faster I never got to a point where I noticed a significant difference, until now(?).

So my question is, where should my colleague go from here? My first thought was to use the CrossTable function in his script as his data has the values that he needs to SUM in columns and he has 60 of these. However, I was worried that, as he has 650,000 records, he would end up with over 130 million records by year end. As I'm new to QlikView I have no idea if this many records would bother QlikView or not. Also, I don't want to suggest that he changes the structure of his data in this way if he has many other charts, etc using the data too. It was at this point that I wondered if he would be better off converting the 60 level nest IF statement into a boolean formula instead. Currently his structure is like this:

IF([Category]='Category A' then Sum([Cost CatA]),
IF([Category]='Category B' then Sum([Cost CatB]),
IF([Category]='Category C' then Sum([Cost CatC]),
IF([Category]='Category D' then Sum([Cost CatD]),
60 lines later...
IF([Category]='Category ZZ8' then Sum([Cost CatZZ8])))etc)))

I would like to suggest he changes this to:

( fabs([Category]='Category A')*Sum([Cost CatA]) ) + ( fabs([Category]='Category B')*Sum([Cost CatB]) ) + ( fabs([Category]='Category C')*Sum([Cost CatC]) ) + etc

but I don't know if he'll be any better off doing it this way or he should be using something else instead.

The structure of his data is:

Field1, Field2, Field3, Field4, Cost_CatA, Cost_CatB, Cost_CatC, etc

His Straight Table uses Field4 (Dimension) and all of the Cost columns (Expression) and we'd like to add fields 2 and 3 as dimensions.

Hope someone can point me in the right direction as to the best way forward.

Many thanks.

cotiso_hanganu · ‎2015-06-12

Yes Lawrence, you've got it right.

I am sure that changing the approach from sum(if) to sum(set) will give you at least 10x faster improvement.

Have a look at some articles like these:

http://community.qlik.com/message/619841#619841

different result I got between using "if" clause and set analysis http://community.qlik.com/message/369085#369085

On the other hand, feeling sorry about you adjusting the 60 nested formulas, I would suggest you at least 3 ways to do the rewriting easier and faster:

1. use expression overview and Find and Replace (beware, there are some risks you should be aware)

2. put your big formula into a variable let's say vFormulaVariableName and call it in Expression Calculation Window with =$(vFormulaVariableName).

2.a. You can do this with method 1 from above, especially if you have to do it in several or even too many places

2.b. Afterwards, you will have to change only in one formula

3. use either Excel or Qlik scripting or Qlik expression

- to convert your formula from the first version to the second

- or to regenerate the formula in the way you need it

through string processing ... (in Qlik you can have a look at least at the followings: & / concat () / substring() / textbetween() )

(better to apply 3 in conjunction with method 2)

View solution in original post

swuehl · ‎2013-12-20

I would try to transform your crosstable to a straight table (using cross table load prefix).

This will significantly increase the number of records, but will also limit the width of your table. You can try to create an additional table for this data linked to your table containing Field1, Field2, Field3, Field4 using a surrogate key.

Using the table will make it much easier to create your charts and expressions, should also perform a lot better. But just do a test with your data, there isn't anything better than doing a test on your concrete set of data.

lawrenceiow · ‎2013-12-20

Thank you for your reply @swuehl, I was going to test my various options but didn't want to spend a lot of time re-writing the complex nested IF statement (then trying to figure out where I'd made typo errors) if I was on the wrong track. I was also worried about system resource if I was to create the extra table (ie un-crossed) in the document.

From your reply should I deduce that moving from a nested IF statement to a boolean structured statement would not give a significant enough performance increase to warrant the effort involved and therefore CrossTable is the way to go?

I must say I was surprised to find that our system supplier (one of QlikView's many partners) had built the data table in this way rather than a straight table but I presume it was what we asked for.

I shall start testing....
Thanks again.

DavidFoster1 · ‎2013-12-20

It looks like you are doing a pivoting action. Perhaps you should look at pivoting the data in the data extraction process.

lawrenceiow · ‎2013-12-20

In other words I should use CrossTable in the script to straighten my 'pivoted' data?

DavidFoster1 · ‎2013-12-20

Crosstable is one option. I tend to use crosstable with text sources. If you are using SQL server you may want to UNPIVOT the data in the initial query. My Oracle is rusty so I dont know if it has an equivalent but I would assume so.

Colin-Albert · ‎2013-12-20

The crosstable function will work with data from SQL sources OK, although the wizard is only available for Excel data.

The question is do you load fewer but wider rows from the pivoted data in SQL, then unpivot in QlikView; or unpivot in SQL first and load more rows but fewer columns into QlikView. The end result is the same, but this choice may be governed by the location of the SQL data on your network vs QV server location, and the performance of the SQL server vs the QV server.

The crosstable function in QlikView is straightforward you just need to load the data with the dimensions first followed the measures and no other columns after the measures.

cotiso_hanganu · ‎2015-06-10

Beside the above mentioned options, there is also SET ANALYSIS option standing by...

For your statements :

IF(condition1,SUM(field1),

IF(condition2,SUM(field2), ...

))

I would do this way

SUM({set 1} field1) + SUM({set 2} field2) + ...

PS: Please be careful alos about the usage of

sum(if(

and

if(sum(...

It's not the same !

And from my 15 years of Qlik development experience and 9 years of delivering Qlik trainings, in most cases people tend to use if(sum(... , while sum(if( is more appropriate

lawrenceiow · ‎2015-06-11

Thank you for your reply @Cotiso_Hanganu, do you mean like this:

Sum(IF([Category]='Category A' then [Cost CatA],
IF([Category]='Category B' then [Cost CatB],
IF([Category]='Category C' then [Cost CatC],
IF([Category]='Category D' then [Cost CatD],
60 lines later...
IF([Category]='Category ZZ8' then [Cost CatZZ8]))etc))))

or, if using set analysis:

Sum({<Category={'Category A'}>}[Cost CatA])+

Sum({<Category={'Category B'}>}[Cost CatB])+

Sum({<Category={'Category C'}>}[Cost CatC])+

Sum({<Category={'Category D'}>}[Cost CatD])+

etc

Sum({<Category={'Category ZZ8'}>}[Cost CatZZ8])

In the data, Category is a subset of Field4, Field4 is a subset of Field3, etc. Whilst the calculation time is as good as instant when Field4 is the only dimension, when Field 3 gets added as second dimension the calculation is much, much longer. My questions was around which method would work faster. I was hoping that someone in the Community would already have such experience to save me actually going ahead and changing the 60 nested IFs only to find that it makes no difference to the calculation speed.

Thanks again for your reply.

cotiso_hanganu · ‎2015-06-12