Qlik Community

Qlik Sense App Development

Discussion board where members can learn more about Qlik Sense App Development and Usage.

Announcements
Coming your way, the Qlik Data Revolution Virtual Summit. October 27-29. REGISTER
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Contributor II
Contributor II

Another approach to implement Count distinct

I have implemented count distinct in a bit different way. Not sure if this will result in better performance or not. 

In script load I used autonumber(Field1,"Type1") as _Key. Then I can get the max of that _Key using peek. I create another table populating it with 2 fields .

LET TtlNum = Peek('_Key',-1, 'TMP');
LET Start = 1;

Table1:
LOAD
$(Start) + Rowno() -1 as [_Key],
1 as Field2
AUTOGENERATE $(TtlNum);

Now in my data model these 2 tables will join on _Key and instead of using count(distinct _Key) I can use sum(Field2) in my expressions. Since I will not be using distinct in my expressions so I am hoping this should be faster.

1 Solution

Accepted Solutions
Highlighted
MVP & Luminary
MVP & Luminary

It's an interesting and clever approach, but appears to be more complex than necessary.  Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then 

Sum(KeyCounter) 
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

View solution in original post

5 Replies
Highlighted
MVP
MVP

Hi Sandeep,

Thanks for sharing your new approach. Without much diving deep into your approach, I too am not really sure if this would be faster or not. However, I would say and highlight that it is a myth now. This count distinct operation used to be single-threaded, but that is long back; today it is multi-threaded. Please check this : A Myth About Count(distinct …)  

Therefore, without discouraging your effort I would say - you should not be much worried about using using count(distinct...

 

I appreciate that you thought of some new approach..🙂

 

Highlighted
MVP & Luminary
MVP & Luminary

It's an interesting and clever approach, but appears to be more complex than necessary.  Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then 

Sum(KeyCounter) 
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

View solution in original post

Highlighted
Contributor II
Contributor II

Thanks Tresesco and Rob!!!

This works well also. Only reason I put  autonumber was that my key is combination of string elements.

Question still though is whether this approach of getting distinct count be faster then using the actual distinct in expression. 

Highlighted
Luminary
Luminary

I would think that it *may* be marginally faster with large data volumes and in situations with concatenated strings as keys, but the only way to know for sure would be to test it. If I'm understanding your solution correctly you are introducing another table so there's a hop between tables introduced which may have a small overhead. Henric Cronstrom wrote an article on the impact of hops years ago (https://community.qlik.com/t5/Qlik-Design-Blog/A-Myth-about-the-Number-of-Hops/bc-p/1474641#M6844) but that was before the more recent version of the engine was developed and it may well perform differently.

If you do get a chance to test it out I'm sure many would be keen to know the results! Thanks for an interesting post.

Highlighted
MVP & Luminary
MVP & Luminary

If you don't need to display the keys as data consider using the new AutoNumber script statement to reduce your RAM usage.

AutoNumber Key;

The count distinct question has some complexities. Perhaps I'll write a blog post about it...

-Rob