Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
It is said that using Sum() instead of count(Distinct ) the performance will increase. But I have duplicate Key values in my data.
So how can I use it. Can anyone help me on this?
Regards,
Anjali Gupta
Despite the Myth of Count Distinct, record counters are still very useful and I use them extensively. Well worth checking out the suggestions / links that have been suggested.
As others have said; today you can use Count(Distinct) without a performance issue.
If you really want to use SUM instead of count(Distinct) you can do the following:
Fact:
load %Keytobecounted, Facts, numbers, etc from fact.qvd (qvd);
CountingTable:
load distinct %Keytobecounted, 1 as Keycounter resident Fact;
This creates a table with the unique keys and a field you can sum that will equal a count of keys.
now you can use the expression Sum(Keycounter) instead of Count(Distinct %Keytobecounted).
Anjali wrote" It is said that using Sum() instead of count(Distinct ) the performance will increase.".
It is not correct. We can use either one. Count(Distinct) won't reduce the performance.
Read tresco's link.
Like many things this all depends......
Count Distinct used to be single threaded and hence could be slow. Some releases ago it was made multi-threaded which made it faster dependent on how many processor cores you have.
But it will take more processor cycles to execute, albeit striped across available cores.
Processor cycles = Electricity used = Heat produced = Global warming
I know I am being pedantic here, generating electricity produces CO2 and every little helps.