Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

count(distinct(id)) vs sum(countValue) - Current Best Practice?

I have always been of the mind that I should use sum(countValue) when building count(id) type logic, with the [countValue] having been added in the model script to facilitate this as a 1 where it should be counted.

As far as I know there are two good reasons for doing this -

  1. sum([a column full of 1s]) was always a quicker, more efficient function than count(distinct(id))
  2. You had control over which rows were included in a count.

However, on the last course I did (~6 months + ago) it was mentioned by the trainer that count() and sum() are now roughly equivalent from a performance point of view.

I am having some issues with count variables, that I may be able to get around using a count(distinct()) function in place of the existing sum() one, but the application in question is large - the count would be over 10's of millions of rows. Before I implement the change, I need to know how it will perform.

What is the current perceived wisdom of the the two methods?

Thanks

Mark

1 Reply
Gysbert_Wassenaar

See this blog post: A Myth About Count(distinct …)


talk is cheap, supply exceeds demand