Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
hic
Former Employee
Former Employee

When counting something in QlikView, you should use the Count() aggregation function. But – should you use Count or Count distinct?

 

Although very similar, the two constructions return completely different results. And often the wrong one is used…

 

So, let me start by clarifying what the two constructions really count:

  • Count( <expression> ) counts the number of records in the data.
  • Count( distinct <expression> ) counts the number of distinct values of the expression.

 

Only instances where the expression isn't NULL are included. Further, if the expression contains references to more than one field, an n-tuple (a temporary table) is created from the cross product of the constituent fields, and the count is performed in the n-tuple instead of in the data table.

 

This means that in the common case – when the <expression> is a simple field reference – the Count(…) is equivalent to the number of rows in the data table, whereas the Count(distinct …) is equivalent to the number of rows in the symbol table. Read more about data tables and symbol tables here.

 

An example: You have a fact table and you want to count the number of orders. Then Count(OrderID) will return the number of records in the fact table, which often is a number larger than the number of orders, since one order usually can have several order lines. In this case you probably want to count the number of unique Order IDs, and should most likely choose Count(distinct OrderID) instead.

 

So, Count(OrderID) does not count the orders!

Image2.png

 

Another trap I have seen people fall into, is when a key field is used as parameter in the Count() function. If the distinct clause is used, there is no problem: The number of distinct values is well defined also for key fields. But if you omit the distinct clause, the number is not well defined: In which data table should QlikView count the number of records? There are several possible source tables, since the field is a key.

 

The Count(Key) will however return a number, but this number is not always the number you want. Sometimes it will be the count from the "wrong" table. So I would recommend that you never use a Count() function on a key field, unless you use the distinct clause.

 

Otherwise, the work-around for using the Count() function (without the distinct clause) on a key field, is the create a copy of this field in the table where you want to count the records, and use this field inside the Count() instead of the key field.

 

Bottom line: For key fields, either use the distinct clause, or the above work-around.

 

HIC

 

See also A myth about Count distinct

11 Comments
Not applicable

Otherwise, the work-around for using the Count() function (without the distinct clause) on a key field, is the create a copy of this field in the table where you want to count the records, and use this field inside the Count() instead of the key field.

If you do this in a large data model be very careful, as QV will not be able to compress the data (all values will be unique by virtue of the fact the field is a key), so this is likely to impact your RAM and storage requirements dramatically.

43,673 Views
martinpedersen
Creator
Creator

Nice article.

My experience is that you will get better performace, if you create a count-field direct in the load script. For instance if you have a distinct OrderTable add the  "1 as OrderIdCount" and then you can use the expression "Sum(OrderIdCount)" in your objects.

0 Likes
43,673 Views
hic
Former Employee
Former Employee

@ Graeme Smith If the key values are unique, then you don't need the work-around. Just use Count(distinct Key). The work-around is only necessary if the key values are not unique.

@ Martin Pedersen My experience is the opposite. I tested exactly this for performance. Read more about my investigations on A Myth About Count(distinct …).

HIC

43,673 Views
martinpedersen
Creator
Creator

Hi Henric,

I think it depends on the complexity on your orderID-keys. I got a case where the "OrderIds" are very long text keys and my performance seems to improve a lot when i use a sum(Count-Field) instead - but perhaps i could use a autonumber of the OrderId instead and get even better performace...

0 Likes
43,673 Views
Not applicable

Henric Cronström - point taken.  Still, in the context of the OrderID example you gave, there could still potentially be a large number of unique OrderID's in the orders table (ultimately depends on the nature of the orders and average number of order line items).  Duplicating this field could still have significant impact on RAM and storage requirements in larger data models, and I still caution people to think carefully about duplicating fields with large numbers of discrete values (irrespective of whether they are unique or not!).

43,673 Views
francisconunez
Contributor
Contributor

Hola Henric,

Tengo una duda sobre la opción de contar el número de casos distintos cuando se incluye una condición:

¿Cómo podría incluir "DISTINCT en la fórmula "Count ( if(ZONE= 'Z1',[CELL]) )"?

Los campos "ZONE" y "CELL" están en distinta tabla

Muchas gracias por adelantado

0 Likes
36,493 Views
JohnREO
Contributor
Contributor

thanks, you saved me again

0 Likes
25,876 Views
1501
Contributor
Contributor

Hi @hic ,

How can I show zero values on the chart using count(distinct) function?

P/S: The checkbox in data handling has been ticked but still unable to show zero values

Thanks,

0 Likes
20,862 Views
caloyskicb
Contributor
Contributor

Hi @hic 

I'm very interested in this Count () aggregate function a little bit more. So, I have a table where the Household ID field may contain "Null" values because the record is for a single client that does not belong to a household. I want to be able to count the records with Household IDs as well as the records with "null" Household IDs. I tried:

Count({<[Household Head of Household]={'Yes'}>} distinct[Client Id])+Count({<[Household Id]={0}>} distinct[Client Id])

It did not work. Please help. Thanks

0 Likes
20,832 Views
barnaby
Contributor II
Contributor II

G'day @caloyskicb ,

I would solve this problem by replacing NULL values in the load script with a value such as "(None)". Then you don't need any fancy code in the count().

Cheers,

Barnaby.

20,812 Views