Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi everyone,
Quick question, why is it that qlik set analysis for SUM returns a 0 versus a blank?
I want to be able to evaluate if the aggregate returned null vs 0 since they mean two different things to our business.
Thanks
Hi Tomasz,
please can you specify your exampler? There are many solutions this problem,
Beck
There recently was a similar discussion, also including a possible work around
It is normal in all the software I have been using that aggregation functions like Sum, Count etc doesn't return nulls.
By the way Set Expressions (aka set analysis) doesn't return any numbers itself. Only the rows or values that will eventually turn into a number by an aggregation function.
Petter, as far as I remember there are DBMS that handle that differently, e.g.
All aggregate functions except COUNT
(*), GROUPING
, and GROUPING_ID
ignore nulls. You can use the NVL
function in the argument to an aggregate function to substitute a value for a null. COUNT
andREGR_COUNT
never return null, but return either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.
Even QV returns NULL for most aggregation functions, just not for Count(),Concat() and Sum() [But for Max() / Min() / Avg(). Maybe I missed a function in one or the other group, just want to show that the handling differs]
When reading through the tech doc, there is no rational why Sum() handle NULL this way (or I missed it):
[Note that In the thread I've referenced above, there is a small discussion about possible work arounds].
I got used to the NULL handling in QV, but I can understand Tomasz and Danny (the OP from the other thread), that returning zero from SUMming only NULL has a different meaning for them than NULL.
Maybe hic can shed some light on the rational behind.
Ignoring values with null as input is something different from returning a null or nulls. It is a matter of inputs versus output. My point is really that it is hard to really define one way as right versus another... How it is implemented in each software is important to know and how to handle to get your analysis to return correct results.
Nulls are often considered notorious - just have a look at what Joe Celko writes about them... One of the SQL gurus that have heated blogs and discussions around the topic. It is a problem that it is handled quite differently from system to system. And you are right that it is handled differently in different DBMS's ... Hard to keep track of ... Excel seems to do it as QlikView actually....
I admit I didn't really know Joe Celko. It's a rainy day so I searched around the net to see what he's got to tell me:
In the original SQL, we had a limited set of simple aggregate functions. Their general format was '<function name> ([DISTINCT| ALL] <expression>)”. The first thing these functions do is to remove all of the NULLs from the data. The function could be applied to an empty set, if only NULLs were returned. What is the sum or average of an empty set? Following the usual conventions of SQL, empty sets return NULL, not zero. Zero is an actual value; that was a very big debate in the Middle Ages, when Europeans started seeing zero showing up in those newfangled Hindu Arabic numerals
I agree that different systems handle things differently and you need to cope with these differences as a developer.
I don't really want to start a hot debate here.
As far as I understood the OP of this thread and specifically the OP of the other thread, they are curious to know about the rational behind this design decision in Qlik software, if there is one.
Up to now, I personally wouldn't accept the rational: "That's the way it's commonly handled"
If Excel shows the same behaviour (I haven't worked much with NULL in Excel), this might explain QlikView's implementation, looking at the history of QlikView
(i.e. I believe Excel was taken as a gold standard for some time in the beginning, concerning the expected output of functions)
There are as many opinions here as there are SQL "gurus"...
However, when these functions were designed, the arguments were like this:
For aggregations of numbers, you should look at Scales of Measurement. An absence of Amounts (Ratios) should be regarded as Zero, while an absence of Coordinates (Intervals) should be regarded as NULL.
HIC
Thanks for giving us some insights into the reasoning behind this behaviour, Henric.
I know there are a lot of different opionions, but it's good to hear the rationale behind the implementation first-hand.
Really appreciated!