Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
cheburashka
Creator III
Creator III

Pick(Match()) vs multiple nested if()'s a case study

Hello,

Why is the pick(match()) combination a better choice than nested ifs looking at performance?

Have you read a post with an actual case study?

How does the match function work?

Qlik help says:

"match - script and chart function

The match function compares the first parameter with all the following ones and returns the number of the expression that matches. The comparison is case sensitive.

Syntax: 

match( str, expr1 [ , expr2,...exprN ])"

What if multiple expressions match? I assume it returns the number of the first expression that matches?

Example:

Match(-1

  , 1 <= 1

  , 1 <= 2

  , 1 <= 3

  , 1 <= 4

  , 1 <= 5

  )

The example above returns 1. While all expressions are correct.

Does the match function evaluate all the cases or only the first one and when it is a match it skips the other cases?


Posts I've read:

Why is using Pick and Match functions combination thought to be faster than nested IF statements?

No answer here.

,Thanks for your input

6 Replies
tresesco
MVP
MVP

It should be similar to how If structure gets evaluated with OR operator - If ( cond1 OR cond2 OR ...) . That is it should stop evaluation at the first TRUE condition. It would be very unexpected and surprising it that doesn't happen.

jonathandienst
Partner - Champion III
Partner - Champion III

I remember seeing a post (I think from Henrik Cronstom) saying that QV does not use shortcut logic like that. There were some technical reasons for that. My experience is that complex Pick()s with many expressions can be very slow.

I can't say whether a nested IF is faster than a Pick(), but the Pick() expression can be easier to maintain. I suspect that it is a case of YMMV, so I would test each option if speed of execution is important (and it usually is).

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
marcus_sommer

Take a look on the answer from johnw here: Re: IF vs Pick Match.

- Marcus

johnw
Champion III
Champion III

While possibly true, it would be very surprising to me if QlikView's architecture somehow prevented this sort of optimization.

Well, we can try to get some evidence. For each of 100 million rows, I generated two random characters between A and Z, but with a 90% chance of being A for the first, a 90% chance of being Z for the second. I did a sum(pick(match())) against the alphabet in order using the two fields that favored A or Z.

If QlikView exits on the first match, I would expect the A expression to be significantly faster than the Z expression. If it checks every value regardless, I would expect the two expressions to take the same amount of time. That won't be proof one way or another regardless of what we see, but it would at least be suggestive.

Each chart on separate tabs. Open on main tab. Close QlikView, open, switch to tab, check calc time, close QlikView, repeat. Here are the results of three checks each chart:


90% A:  390 390 406
90% Z:  406 406 391

Well, I think that's suggestive that you may be right. It's possible there's some other explanation, but what you said seems like a straightforward explanation, particularly if you think you remember Henric stating it's the case. I just can't think of why it would be that way.

maxgro
MVP
MVP

I think here ("short circuiting")

Re: help - performance improvements

johnw
Champion III
Champion III

Ah, thank you. At least there's this: "The only improvement that could be made, is to check whether the Condition is an aggregation or just a check of a variable value, and use short circuiting for the latter case."

I'm checking a field rather than a variable in my test, but there's no aggregation inside of my pick(match()), and so it seems reasonable that we expected it to short circuit, but QlikView doesn't because it's operating on the assumption that everything is some sort of aggregation expression rather than having separate paths depending on whether it is or isn't. It might be nice if the product had that optimization. On the other hand, I'M the one deoptimizing the whole thing by putting my sum() on the outside instead of on the inside. It's MY intentional mistake that's causing the slowness. Had I done it "the QlikView way" with the sum() on the inside, everything probably would have been fine.