count(distinct(id)) vs sum(countValue) - Current Best Practice?

Mon, 08 Feb 2016 15:05:02 GMT

I have always been of the mind that I should use sum(countValue) when building count(id) type logic, with the [countValue] having been added in the model script to facilitate this as a 1 where it should be counted.

As far as I know there are two good reasons for doing this -

sum([a column full of 1s]) was always a quicker, more efficient function than count(distinct(id))
You had control over which rows were included in a count.

However, on the last course I did (~6 months + ago) it was mentioned by the trainer that count() and sum() are now roughly equivalent from a performance point of view.

I am having some issues with count variables, that I may be able to get around using a count(distinct()) function in place of the existing sum() one, but the application in question is large - the count would be over 10's of millions of rows. Before I implement the change, I need to know how it will perform.

What is the current perceived wisdom of the the two methods?

Thanks

Mark

Re: count(distinct(id)) vs sum(countValue) - Current Best Practice?

Gysbert_Wassenaar — Mon, 08 Feb 2016 17:49:43 GMT

See this blog post: A Myth About Count(distinct …)

topic count(distinct(id)) vs sum(countValue) - Current Best Practice? in QlikView

count(distinct(id)) vs sum(countValue) - Current Best Practice?

Re: count(distinct(id)) vs sum(countValue) - Current Best Practice?