Quick Tips #5 - Expressions, count(distinct <fieldname>)

The original Quick Tips #5 is to some extent still valid but the solution, and reason given, is not.

 

Every click will trigger three main phases.

The initial phase filters the data with the new selection. This is multi-threaded. The next phase is single-threaded, per layout object. The last phase is the calculation phase and each object will do this multi-threaded.

 

It is not count(distinct <fieldname> that is single-threaded but the fact that fields often uses with this function are often located in distant tables, in the data model. That will force a more complex phase two.

 

The key is to shorten the distance between the fields used for the dimension and the expression. If they are in the same table then all phases will be very quick.

Using the Task Manager to identify objects that spend too much time in phase two is still a valid tip but the "solution" below is not the optimal way to mitigate the extended phase two.

 

Functions are multi-threaded, but gathering the data for the calculation is not. The focus should be to minimize this phase.

 

Cheers from the Scalability Team

 

-----------------

 

"Old, proven semi-wrong"-Quick Tips #5.

 

When following the Quick Tips #5 you should always check for any occurrence of Count(Distinct <fieldname>) by searching for the word "distinct" in the Expression Overview. Try to avoid this function because it forces QlikView to switch from using many cores to use only one core. You can verify this by checking the utilization of the different cores in Task Manager as suggested in Quick Tips #1.

 

Normally the purpose of the count-distinct is to count the number of unique dimensional values like - how many customers has bought a product?

 

A technique to solve this in a multi-threaded way is to create a constant value of 1 in the dimension table and then use a sum() of that field instead.

 

Cheers from the Scalability Team.