Set analysis returns 0, for null records

Report Inappropriate Content · ‎2016-02-19

Hi everyone,

Quick question, why is it that qlik set analysis for SUM returns a 0 versus a blank?

I want to be able to evaluate if the aggregate returned null vs 0 since they mean two different things to our business.

Thanks

beck_bakytbek · ‎2016-02-19

Hi Tomasz,

please can you specify your exampler? There are many solutions this problem,

Beck

swuehl · ‎2016-02-19

There recently was a similar discussion, also including a possible work around

Why does Sum() and Count() never equate to NULL?

petter · ‎2016-02-20

It is normal in all the software I have been using that aggregation functions like Sum, Count etc doesn't return nulls.

By the way Set Expressions (aka set analysis) doesn't return any numbers itself. Only the rows or values that will eventually turn into a number by an aggregation function.

swuehl · ‎2016-02-20

Petter, as far as I remember there are DBMS that handle that differently, e.g.

All aggregate functions except COUNT(*), GROUPING, and GROUPING_ID ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null. COUNT andREGR_COUNT never return null, but return either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.

[Aggregate Functions ]

Even QV returns NULL for most aggregation functions, just not for Count(),Concat() and Sum() [But for Max() / Min() / Avg(). Maybe I missed a function in one or the other group, just want to show that the handling differs]

When reading through the tech doc, there is no rational why Sum() handle NULL this way (or I missed it):

NULL handling in QlikView

[Note that In the thread I've referenced above, there is a small discussion about possible work arounds].

I got used to the NULL handling in QV, but I can understand Tomasz and Danny (the OP from the other thread), that returning zero from SUMming only NULL has a different meaning for them than NULL.

Maybe hic‌ can shed some light on the rational behind.

petter · ‎2016-02-20

Ignoring values with null as input is something different from returning a null or nulls. It is a matter of inputs versus output. My point is really that it is hard to really define one way as right versus another... How it is implemented in each software is important to know and how to handle to get your analysis to return correct results.

petter · ‎2016-02-20

Nulls are often considered notorious - just have a look at what Joe Celko writes about them... One of the SQL gurus that have heated blogs and discussions around the topic. It is a problem that it is handled quite differently from system to system. And you are right that it is handled differently in different DBMS's ... Hard to keep track of ... Excel seems to do it as QlikView actually....

swuehl · ‎2016-02-20

I admit I didn't really know Joe Celko. It's a rainy day so I searched around the net to see what he's got to tell me:

Basic Aggregate Functions

In the original SQL, we had a limited set of simple aggregate functions. Their general format was '<function name> ([DISTINCT| ALL] <expression>)”. The first thing these functions do is to remove all of the NULLs from the data. The function could be applied to an empty set, if only NULLs were returned. What is the sum or average of an empty set? Following the usual conventions of SQL, empty sets return NULL, not zero. Zero is an actual value; that was a very big debate in the Middle Ages, when Europeans started seeing zero showing up in those newfangled Hindu Arabic numerals

SQL GROUP BY Basics‌

I agree that different systems handle things differently and you need to cope with these differences as a developer.

I don't really want to start a hot debate here.

As far as I understood the OP of this thread and specifically the OP of the other thread, they are curious to know about the rational behind this design decision in Qlik software, if there is one.

Up to now, I personally wouldn't accept the rational: "That's the way it's commonly handled"

If Excel shows the same behaviour (I haven't worked much with NULL in Excel), this might explain QlikView's implementation, looking at the history of QlikView

(i.e. I believe Excel was taken as a gold standard for some time in the beginning, concerning the expected output of functions)

hic · ‎2016-02-20

There are as many opinions here as there are SQL "gurus"...

However, when these functions were designed, the arguments were like this:

For Count() it doesn't make sense to ever return NULL. If there aren't any records - which is the case for NULLs - then Count() has found zero records. Hence, it should return zero.

For aggregations of numbers, you should look at Scales of Measurement. An absence of Amounts (Ratios) should be regarded as Zero, while an absence of Coordinates (Intervals) should be regarded as NULL.

For Sum() you can assume that most numbers are Amounts (Ratios), hence an absence of records should be interpreted as zero.
For other aggregation functions Min(), Max(), Avg() you don't know if it is an Amount (Ratio) or a Coordinate (I‌nterval). Then it is better to return NULL.

HIC

swuehl · ‎2016-02-20

Thanks for giving us some insights into the reasoning behind this behaviour, Henric.

I know there are a lot of different opionions, but it's good to hear the rationale behind the implementation first-hand.

Really appreciated!

Basic Aggregate Functions

Related Topics