<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quick Tips #5 - Expressions, count(distinct &amp;lt;fieldname&amp;gt;) in Scalability</title>
    <link>https://community.qlik.com/t5/Scalability/Quick-Tips-5-Expressions-count-distinct-lt-fieldname-gt/m-p/1487430#M1303</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The original Quick Tips #5 is to some extent still valid but the solution, and reason given, is not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Every click will trigger three main phases.&lt;/P&gt;&lt;P&gt;The initial phase filters the data with the new selection. This is multi-threaded. The next phase is single-threaded, per layout object. The last phase is the calculation phase and each object will do this multi-threaded.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is not count(distinct &amp;lt;fieldname&amp;gt; that is single-threaded but the fact that fields often uses with this function are often located in distant tables, in the data model. That will force a more complex phase two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The key is to shorten the distance between the fields used for the dimension and the expression. If they are in the same table then all phases will be very quick.&lt;/P&gt;&lt;P&gt;Using the Task Manager to identify objects that spend too much time in phase two is still a valid tip but the "solution" below is not the optimal way to mitigate the extended phase two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Functions are multi-threaded, but gathering the data for the calculation is not. The focus should be to minimize this phase.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers from the Scalability Team&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;-----------------&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;"Old, proven semi-wrong"-Quick Tips #5.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;When following the Quick Tips #5 you should always check for any occurrence of Count(Distinct &amp;lt;fieldname&amp;gt;) by searching for the word "distinct" in the Expression Overview. Try to avoid this function because it forces QlikView to switch from using many cores to use only one core. You can verify this by checking the utilization of the different cores in Task Manager as suggested in Quick Tips #1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;Normally the purpose of the count-distinct is to count the number of unique dimensional values like - how many customers has bought a product?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;A technique to solve this in a multi-threaded way is to create a constant value of 1 in the dimension table and then use a sum() of that field instead.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;Cheers from the Scalability Team.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 13 Jun 2011 12:14:05 GMT</pubDate>
    <dc:creator>lse</dc:creator>
    <dc:date>2011-06-13T12:14:05Z</dc:date>
    <item>
      <title>Quick Tips #5 - Expressions, count(distinct &lt;fieldname&gt;)</title>
      <link>https://community.qlik.com/t5/Scalability/Quick-Tips-5-Expressions-count-distinct-lt-fieldname-gt/m-p/1487430#M1303</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The original Quick Tips #5 is to some extent still valid but the solution, and reason given, is not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Every click will trigger three main phases.&lt;/P&gt;&lt;P&gt;The initial phase filters the data with the new selection. This is multi-threaded. The next phase is single-threaded, per layout object. The last phase is the calculation phase and each object will do this multi-threaded.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is not count(distinct &amp;lt;fieldname&amp;gt; that is single-threaded but the fact that fields often uses with this function are often located in distant tables, in the data model. That will force a more complex phase two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The key is to shorten the distance between the fields used for the dimension and the expression. If they are in the same table then all phases will be very quick.&lt;/P&gt;&lt;P&gt;Using the Task Manager to identify objects that spend too much time in phase two is still a valid tip but the "solution" below is not the optimal way to mitigate the extended phase two.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Functions are multi-threaded, but gathering the data for the calculation is not. The focus should be to minimize this phase.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers from the Scalability Team&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;-----------------&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;"Old, proven semi-wrong"-Quick Tips #5.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;When following the Quick Tips #5 you should always check for any occurrence of Count(Distinct &amp;lt;fieldname&amp;gt;) by searching for the word "distinct" in the Expression Overview. Try to avoid this function because it forces QlikView to switch from using many cores to use only one core. You can verify this by checking the utilization of the different cores in Task Manager as suggested in Quick Tips #1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;Normally the purpose of the count-distinct is to count the number of unique dimensional values like - how many customers has bought a product?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;A technique to solve this in a multi-threaded way is to create a constant value of 1 in the dimension table and then use a sum() of that field instead.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US"&gt;Cheers from the Scalability Team.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 13 Jun 2011 12:14:05 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Scalability/Quick-Tips-5-Expressions-count-distinct-lt-fieldname-gt/m-p/1487430#M1303</guid>
      <dc:creator>lse</dc:creator>
      <dc:date>2011-06-13T12:14:05Z</dc:date>
    </item>
  </channel>
</rss>

