Solved: Another approach to implement Count distinct - Qlik Community

sandeepgather · ‎2019-12-03

I have implemented count distinct in a bit different way. Not sure if this will result in better performance or not.

In script load I used autonumber(Field1,"Type1") as _Key. Then I can get the max of that _Key using peek. I create another table populating it with 2 fields .

LET TtlNum = Peek('_Key',-1, 'TMP');
LET Start = 1;

Table1:
LOAD
$(Start) + Rowno() -1 as [_Key],
1 as Field2
AUTOGENERATE $(TtlNum);

Now in my data model these 2 tables will join on _Key and instead of using count(distinct _Key) I can use sum(Field2) in my expressions. Since I will not be using distinct in my expressions so I am hoping this should be faster.

rwunderlich · ‎2019-12-04

It's an interesting and clever approach, but appears to be more complex than necessary. Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then

Sum(KeyCounter)
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

View solution in original post

tresesco · ‎2019-12-04

Hi Sandeep,

Thanks for sharing your new approach. Without much diving deep into your approach, I too am not really sure if this would be faster or not. However, I would say and highlight that it is a myth now. This count distinct operation used to be single-threaded, but that is long back; today it is multi-threaded. Please check this : A Myth About Count(distinct …)

Therefore, without discouraging your effort I would say - you should not be much worried about using using count(distinct...

I appreciate that you thought of some new approach..🙂

rwunderlich · ‎2019-12-04

It's an interesting and clever approach, but appears to be more complex than necessary. Could you skip the AutoNumber and do instead:

KeyCounters:
LOAD FieldValue('Key', RecNo()) as Key,
1 as KeyCounter
AutoGenerate FieldValueCount('Key');

and then

Sum(KeyCounter)
or Count(KeyCounter) -- which might be faster than Sum

-Rob
http://masterssummit.com
http://qlikviewcookbook.com
http://www.easyqlik.com

sandeepgather · ‎2019-12-04

Thanks Tresesco and Rob!!!

This works well also. Only reason I put autonumber was that my key is combination of string elements.

Question still though is whether this approach of getting distinct count be faster then using the actual distinct in expression.

Rodj · ‎2019-12-04

I would think that it *may* be marginally faster with large data volumes and in situations with concatenated strings as keys, but the only way to know for sure would be to test it. If I'm understanding your solution correctly you are introducing another table so there's a hop between tables introduced which may have a small overhead. Henric Cronstrom wrote an article on the impact of hops years ago (https://community.qlik.com/t5/Qlik-Design-Blog/A-Myth-about-the-Number-of-Hops/bc-p/1474641#M6844) but that was before the more recent version of the engine was developed and it may well perform differently.

If you do get a chance to test it out I'm sure many would be keen to know the results! Thanks for an interesting post.

rwunderlich · ‎2019-12-04

If you don't need to display the keys as data consider using the new AutoNumber script statement to reduce your RAM usage.

AutoNumber Key;

The count distinct question has some complexities. Perhaps I'll write a blog post about it...

-Rob

Another approach to implement Count distinct

@rwunderlich

@rwunderlich @marcus_sommer @Oleg_Troyansky

@sunny_talwar