Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

count(distinct(id)) vs sum(countValue) - Current Best Practice?

I have always been of the mind that I should use sum(countValue) when building count(id) type logic, with the [countValue] having been added in the model script to facilitate this as a 1 where it should be counted.

As far as I know there are two good reasons for doing this -

  1. sum([a column full of 1s]) was always a quicker, more efficient function than count(distinct(id))
  2. You had control over which rows were included in a count.

However, on the last course I did (~6 months + ago) it was mentioned by the trainer that count() and sum() are now roughly equivalent from a performance point of view.

I am having some issues with count variables, that I may be able to get around using a count(distinct()) function in place of the existing sum() one, but the application in question is large - the count would be over 10's of millions of rows. Before I implement the change, I need to know how it will perform.

What is the current perceived wisdom of the the two methods?

Thanks

Mark

1 Reply
Gysbert_Wassenaar
Partner - Champion III
Partner - Champion III

See this blog post: A Myth About Count(distinct …)


talk is cheap, supply exceeds demand