3 Replies Latest reply: Sep 5, 2012 5:58 AM by elanir01 RSS

    distinct count issue

      hi,

       

      i have a data input about products, each with a number of product versiones.

      each version has a status (enabled, disabled, deleted) and a product that had a deleted status in one of its versions cannot move to enabled in a later version.

       

      now, im trying to calculate the total number of distinct products that are enabled.

      calculating count (distinct product id) and filtering on status will not be good since if a product is deleted at its latest version, it still had an earlier version enabled so the filter will keep the earlier versions and the count distinct will count the product though it shouldnt....

       

      data looks somthing like this:

      product idproduct version id
      status
      1234561enabled
      1234562enabled
      1234563enabled
      4567891enabled
      4567892deleted
      7890001enabled
      7890002enabled

      so, for the above sample my count should return '2'  since second product was deleted on its second version but a regular coun distinct will return 3.

       

       

      i was thinking about somthing like selecting the max version per product and for that filter out the deleted and only than count the products.

      any ideas?

        • Re: distinct count issue
          Miguel Angel Baeyens de Arce

          Hi,

           

          An expression like the following will work:

           

          =Count(Aggr(Only(If(Aggr(FirstSortedValue(status, -[product version id]), [product id]) <> 'deleted', [product id])), [product id]))
          

           

          However, it's quite tricky and depending on the number of rows your application is handling, it may take a long time to calculate. So, you rather get the value in the script, using Peek() and Previous() functions:

           

          Original:
          LOAD [product id], 
               [product version id], 
               status
          FROM
          Distinct.xls
          (biff, embedded labels, table is Sheet1$);
          
          Final:
          LOAD *,
               If([product id] <> Previous([product id]), If(status = 'deleted', 0, 1)) AS ProductCheck
          RESIDENT Original // do not use resident loads with large tables
          ORDER BY [product id], [product version id] DESC;
          
          DROP TABLE Original;
          

           

          The expression now is as simple as

           

          Sum(ProductCheck)
          

           

          Hope that helps.

           

          Miguel

          • Re: distinct count issue
            Nils Morris

            Hi,

             

            I might have misunderstood your question but would something like this solve your problem?

             

            Count({$<status={enabled},[product version id]={$(=Max([product version id]))>} DISTINCT [product id])

              • Re: distinct count issue

                niromo,

                 

                i think your solution will not be good enough since it compares the max version id, not taking underconsideration that each product has a different max version id.

                 

                 

                miguel,

                thank you for your reply. i'll give it a try and let you know what i come up with...