2 Replies Latest reply: Jun 24, 2016 10:43 AM by Michael Granillo RSS

    Count Distinct Accumulation in Qlik Sense

    Michael Granillo

      It seems there are a lot of posts covering count distinct accumulation over time but none of which worked for me.

       

      Here is a solution I made that seems simple.

       

      I sorted my table in the load script by my dimension (month in my case).  Then i used the following chart expression:

       

      rangesum(

          above(

                sum(

                      aggr(

                            count(distinct "Name")

                    ,"Name")

                    )

                ,0,RowNo())

      )

       

      Obviously, you'll need to replace "Name" with the column you are distinctly counting.  The result will be a nice accumulating line like below:

      6e369a9e76d94bc5bc62cf046f2ff66a.png

       

      EDIT: For those curious about how this is working, it is an unintended result from a grain mis-match on the aggr function (not recommended but it does work).  To explain, start by ignoring the "rangesum(above(" part of the expression.  This is the part needed for the final accumulation.  The inner part: "sum(aggr(count(distinct "Name"),"Name"))"  is where we see strange behavior.

       

      Starting with our sample data:

       

      LOAD * INLINE [

      Month, Name

      1,A

      1,B

      1,C

      1,D

      1,E

      2,F

      2,G

      2,A

      2,B

      2,C

      2,D

      3,H

      3,A

      3,B

      3,C

      ];

       

      Qlik evaluates sum(aggr(count(distinct "Name"),"Name"))  by first applying it to our month dimension as in any chart (Month=1).  It then jumps to the next dimension we've specified in the aggr function, "Name", and evaluates all instances of the given name.  So "1,A", "2,A","3,A" are all evaluated for month =1 even though Month=1 is not the month associated with "2,A" and "3,A".  Count(distinct Name) is then applied to these three rows and returns a count of 1.  These steps are applied for each "Name" in Month =1 and returns a final count of 5.

       

      Processing then jumps to month 2 and the evaluation steps continue but rows "2,A" through "2,D" are skipped because they were already evaluated when Month equaled 1.  The final count returned for Month=2 is 2 ("2,F" and "2,G" are counted).  Month 3 returns a count of 1 because all rows have been evaluated in previous steps with the exception of "3,H".  The final result is this table:

                                        

      MonthDistinct Count by Month
      15
      22
      31

       

      The reason this method is not recommended is because Qlik doesn't know how to apply the inner dimension (Name) with the outer dimension (Month).  The load order is the only factor here that allows this to work.  We've essentially gotten lucky that our load order happens to make this work.  Had our table been loaded differently we would get a different result.

       

      Say I move my second row, "1,B" to the bottom of my data load:

       

      LOAD * INLINE [

      Month, Name

      1,A

       

      1,C

      1,D

      1,E

      2,F

      2,G

      2,A

      2,B

      2,C

      2,D

      3,H

      3,A

      3,B

      3,C

      1,B

      ];

       

      The distinct count will now be 4,3,1 instead of the previous 5,2,1

      MonthDistinct Count by Month
      14
      23
      31

       

       

      If we add in the accumulation part of the function ( "rangsum(above(" ) we arrive at the same final accumulation = 8 but the way we got there is different.  See the attached sample app for these two examples.

       

      Happy Qliking!

       

      Mike