Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
Why is the pick(match()) combination a better choice than nested ifs looking at performance?
Have you read a post with an actual case study?
How does the match function work?
Qlik help says:
"match - script and chart function
The match function compares the first parameter with all the following ones and returns the number of the expression that matches. The comparison is case sensitive.
Syntax:
match( str, expr1 [ , expr2,...exprN ])"
What if multiple expressions match? I assume it returns the number of the first expression that matches?
Example:
Match(-1
, 1 <= 1
, 1 <= 2
, 1 <= 3
, 1 <= 4
, 1 <= 5
)
The example above returns 1. While all expressions are correct.
Does the match function evaluate all the cases or only the first one and when it is a match it skips the other cases?
Posts I've read:
Why is using Pick and Match functions combination thought to be faster than nested IF statements?
No answer here.
,Thanks for your input
It should be similar to how If structure gets evaluated with OR operator - If ( cond1 OR cond2 OR ...) . That is it should stop evaluation at the first TRUE condition. It would be very unexpected and surprising it that doesn't happen.
I remember seeing a post (I think from Henrik Cronstom) saying that QV does not use shortcut logic like that. There were some technical reasons for that. My experience is that complex Pick()s with many expressions can be very slow.
I can't say whether a nested IF is faster than a Pick(), but the Pick() expression can be easier to maintain. I suspect that it is a case of YMMV, so I would test each option if speed of execution is important (and it usually is).
Take a look on the answer from johnw here: Re: IF vs Pick Match.
- Marcus
While possibly true, it would be very surprising to me if QlikView's architecture somehow prevented this sort of optimization.
Well, we can try to get some evidence. For each of 100 million rows, I generated two random characters between A and Z, but with a 90% chance of being A for the first, a 90% chance of being Z for the second. I did a sum(pick(match())) against the alphabet in order using the two fields that favored A or Z.
If QlikView exits on the first match, I would expect the A expression to be significantly faster than the Z expression. If it checks every value regardless, I would expect the two expressions to take the same amount of time. That won't be proof one way or another regardless of what we see, but it would at least be suggestive.
Each chart on separate tabs. Open on main tab. Close QlikView, open, switch to tab, check calc time, close QlikView, repeat. Here are the results of three checks each chart:
90% A: 390 390 406
90% Z: 406 406 391
Well, I think that's suggestive that you may be right. It's possible there's some other explanation, but what you said seems like a straightforward explanation, particularly if you think you remember Henric stating it's the case. I just can't think of why it would be that way.
I think here ("short circuiting")
Ah, thank you. At least there's this: "The only improvement that could be made, is to check whether the Condition is an aggregation or just a check of a variable value, and use short circuiting for the latter case."
I'm checking a field rather than a variable in my test, but there's no aggregation inside of my pick(match()), and so it seems reasonable that we expected it to short circuit, but QlikView doesn't because it's operating on the assumption that everything is some sort of aggregation expression rather than having separate paths depending on whether it is or isn't. It might be nice if the product had that optimization. On the other hand, I'M the one deoptimizing the whole thing by putting my sum() on the outside instead of on the inside. It's MY intentional mistake that's causing the slowness. Had I done it "the QlikView way" with the sum() on the inside, everything probably would have been fine.