<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Data model/distinct count challenge in QlikView</title>
    <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312136#M115236</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Create an AccountNumberCount Table with two fields:&lt;/P&gt;&lt;P&gt;AccountNumber -- all the DISTINCT AccountNumbers in your data.&lt;/P&gt;&lt;P&gt;AccountNumberCounter = 1&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AccountNumber, whether in a dimension or a fact, will point to AccountNumber in the Table. Then sum(AccountNumberCounter) will give you the correct count. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Depending on your model, this may create a loop. You can fix that by using a different link field for each table to the AccountNumberCountTable. Use autonumber() for the links and they won't take up much space.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-Rob&lt;/P&gt;&lt;P&gt;&lt;A class="jive-link-external-small" href="http://robwunderlich.com"&gt;http://robwunderlich.com&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 28 Jul 2011 01:05:00 GMT</pubDate>
    <dc:creator>rwunderlich</dc:creator>
    <dc:date>2011-07-28T01:05:00Z</dc:date>
    <item>
      <title>Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312134#M115234</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="margin: 0in 0in 10pt;"&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;my requirement is to be able to compute a distinct count of Account Numbers&lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;based on any selections the user makes. &lt;/SPAN&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="margin: 0in 0in 10pt;"&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;Doing a distinct count worked OK with a small data set. The challenge comes when we increased the data volume (now being over 400 million records) which then causes performance issues.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="margin: 0in 0in 10pt;"&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;We tried aggregating the data on the database side by computing the distinct count in the database table at the various combinations of all the dimensions required&lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;( ex: at division,network, time of day etc.) , but the problem with that approach is the fact that you can’t sum up a distinct count &lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;in the dashboard if the users wanted to see a distinct count at a higher level of granularity because that would result in double counting (or more) the Account Numbers&lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;i.e one Account Number can be in multiple divisions, networks, time&lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;of day etc.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="margin: 0in 0in 10pt;"&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;Has anyone has any experience with similar requirements?&lt;SPAN style="mso-spacerun: yes;"&gt; &lt;/SPAN&gt;Any thoughts on how we can accomplish this whether it involves a different data model or some type of data manipulation we can do in the script or the dashboard?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #000000; font-family: Times New Roman; font-size: 12pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P class="MsoNormal" style="margin: 0in 0in 10pt;"&gt;&lt;SPAN style="color: #000000; font-size: 12pt; font-family: Calibri;"&gt;Thanks, Maria&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #000000; font-family: Times New Roman; font-size: 12pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jul 2011 21:30:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312134#M115234</guid>
      <dc:creator />
      <dc:date>2011-07-27T21:30:15Z</dc:date>
    </item>
    <item>
      <title>Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312135#M115235</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Maria,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;you can try making an order by of your table and adding a flag that says if the previous row has the same value, then in your expression you just have to make a sum of your flag; the code should be something like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;load&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fieldA,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fieldB,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fieldToCount,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(peek('fieldToCount') = fieldToCount,0,1) as Flag&lt;/P&gt;&lt;P&gt;from...&lt;/P&gt;&lt;P&gt;order by fieldToCount Asc;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;after this, the expression should be: sum(Flag)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;&lt;P&gt;Regards!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jul 2011 22:01:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312135#M115235</guid>
      <dc:creator />
      <dc:date>2011-07-27T22:01:19Z</dc:date>
    </item>
    <item>
      <title>Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312136#M115236</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Create an AccountNumberCount Table with two fields:&lt;/P&gt;&lt;P&gt;AccountNumber -- all the DISTINCT AccountNumbers in your data.&lt;/P&gt;&lt;P&gt;AccountNumberCounter = 1&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AccountNumber, whether in a dimension or a fact, will point to AccountNumber in the Table. Then sum(AccountNumberCounter) will give you the correct count. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Depending on your model, this may create a loop. You can fix that by using a different link field for each table to the AccountNumberCountTable. Use autonumber() for the links and they won't take up much space.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-Rob&lt;/P&gt;&lt;P&gt;&lt;A class="jive-link-external-small" href="http://robwunderlich.com"&gt;http://robwunderlich.com&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 28 Jul 2011 01:05:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312136#M115236</guid>
      <dc:creator>rwunderlich</dc:creator>
      <dc:date>2011-07-28T01:05:00Z</dc:date>
    </item>
    <item>
      <title>Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312137#M115237</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Rob,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this is a good idea but you are assuming that we are keeping the Account Number in the main table. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We want to aggregate the data because the volume is getting too big. If we aggregate the data, we will loose the Account Number, but still want to be able to compute the distinct count of Account Number at the various levels.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Maria&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 28 Jul 2011 21:11:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312137#M115237</guid>
      <dc:creator />
      <dc:date>2011-07-28T21:11:13Z</dc:date>
    </item>
    <item>
      <title>Re: Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312138#M115238</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would bet that the simplest and smallest data model required to support your requirement is one in which you haven't aggregated your data by dropping the account numbers.&amp;nbsp; In other words, I would bet that what you want to do is impossible.&amp;nbsp; I can't think how to arrange a proof, though, so I could be wrong.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Gabriela, if I understood your solution, I see a problem with it.&amp;nbsp; Let's take a simple example:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;AccountType, AccountNumber&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 2&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 3&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 2&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 3&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 4&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 3&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 4&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 5&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Your solution would set the flag like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;AccountType, AccountNumber, Flag&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 1, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 2, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 2, 0&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 3, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 3, 0&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 3, 0&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 4, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 4, 0&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 5, 1&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And then aggregation with sum() to get rid of our account numbers would give us this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;AccountType, SumFlag&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;A, 3&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;B, 1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;C, 1&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Select B and C.&amp;nbsp; How many distinct account numbers?&amp;nbsp; The right answer is 4, but this data model tells us 2.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 28 Jul 2011 22:16:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312138#M115238</guid>
      <dc:creator>johnw</dc:creator>
      <dc:date>2011-07-28T22:16:10Z</dc:date>
    </item>
    <item>
      <title>Re: Data model/distinct count challenge</title>
      <link>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312139#M115239</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Maria,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't think my solution requires keeping transaction level tables in the model. What I'm suggesting is building up the AccountNumberCount table as you do the aggregation&amp;nbsp; Assume a number of transactions, each with an AccountNumber, Region, Product&amp;nbsp; and Amount. Rollup will be sum(Amount) by Region. Build a link table that links each Region to every AccountNumber in the AccountNumberCount table that appears in that Region. Links would also be built for other rollups such as Product. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So selecting a Region will associate a certain number of rows in the AccountNumberCount table. Selecting products sold in that Region would further narrow the rows in AccountNumberCount. At any given point of selections, =sum(AccountNumberCounter) should give the correct count of AccountNumbers.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-Rob&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 29 Jul 2011 05:17:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Data-model-distinct-count-challenge/m-p/312139#M115239</guid>
      <dc:creator>rwunderlich</dc:creator>
      <dc:date>2011-07-29T05:17:30Z</dc:date>
    </item>
  </channel>
</rss>

