Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
sandeepgather
Contributor II
Contributor II

Another approach to implement Count distinct

I have implemented count distinct in a bit different way. Not sure if this will result in better performance or not. 

In script load I used autonumber(Field1,"Type1") as _Key. Then I can get the max of that _Key using peek. I create another table populating it with 2 fields .

LET TtlNum = Peek('_Key',-1, 'TMP');
LET Start = 1;

Table1:
LOAD
$(Start) + Rowno() -1 as [_Key],
1 as Field2
AUTOGENERATE $(TtlNum);

Now in my data model these 2 tables will join on _Key and instead of using count(distinct _Key) I can use sum(Field2) in my expressions. Since I will not be using distinct in my expressions so I am hoping this should be faster.

1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It's an interesting and clever approach, but appears to be more complex than necessary.  Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then 

Sum(KeyCounter) 
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

View solution in original post

5 Replies
tresesco
MVP
MVP

Hi Sandeep,

Thanks for sharing your new approach. Without much diving deep into your approach, I too am not really sure if this would be faster or not. However, I would say and highlight that it is a myth now. This count distinct operation used to be single-threaded, but that is long back; today it is multi-threaded. Please check this : A Myth About Count(distinct …)  

Therefore, without discouraging your effort I would say - you should not be much worried about using using count(distinct...

 

I appreciate that you thought of some new approach..🙂

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

It's an interesting and clever approach, but appears to be more complex than necessary.  Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then 

Sum(KeyCounter) 
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

sandeepgather
Contributor II
Contributor II
Author

Thanks Tresesco and Rob!!!

This works well also. Only reason I put  autonumber was that my key is combination of string elements.

Question still though is whether this approach of getting distinct count be faster then using the actual distinct in expression. 

Rodj
Luminary Alumni
Luminary Alumni

I would think that it *may* be marginally faster with large data volumes and in situations with concatenated strings as keys, but the only way to know for sure would be to test it. If I'm understanding your solution correctly you are introducing another table so there's a hop between tables introduced which may have a small overhead. Henric Cronstrom wrote an article on the impact of hops years ago (https://community.qlik.com/t5/Qlik-Design-Blog/A-Myth-about-the-Number-of-Hops/bc-p/1474641#M6844) but that was before the more recent version of the engine was developed and it may well perform differently.

If you do get a chance to test it out I'm sure many would be keen to know the results! Thanks for an interesting post.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

If you don't need to display the keys as data consider using the new AutoNumber script statement to reduce your RAM usage.

AutoNumber Key;

The count distinct question has some complexities. Perhaps I'll write a blog post about it...

-Rob