Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

distinct count issue

hi,

i have a data input about products, each with a number of product versiones.

each version has a status (enabled, disabled, deleted) and a product that had a deleted status in one of its versions cannot move to enabled in a later version.

now, im trying to calculate the total number of distinct products that are enabled.

calculating count (distinct product id) and filtering on status will not be good since if a product is deleted at its latest version, it still had an earlier version enabled so the filter will keep the earlier versions and the count distinct will count the product though it shouldnt....

data looks somthing like this:

product idproduct version id
status
1234561enabled
1234562enabled
1234563enabled
4567891enabled
4567892deleted
7890001enabled
7890002enabled

so, for the above sample my count should return '2'  since second product was deleted on its second version but a regular coun distinct will return 3.

i was thinking about somthing like selecting the max version per product and for that filter out the deleted and only than count the products.

any ideas?

1 Solution

Accepted Solutions
Miguel_Angel_Baeyens

Hi,

An expression like the following will work:

=Count(Aggr(Only(If(Aggr(FirstSortedValue(status, -[product version id]), [product id]) <> 'deleted', [product id])), [product id]))

However, it's quite tricky and depending on the number of rows your application is handling, it may take a long time to calculate. So, you rather get the value in the script, using Peek() and Previous() functions:

Original:

LOAD [product id],

     [product version id],

     status

FROM

Distinct.xls

(biff, embedded labels, table is Sheet1$);

Final:

LOAD *,

     If([product id] <> Previous([product id]), If(status = 'deleted', 0, 1)) AS ProductCheck

RESIDENT Original // do not use resident loads with large tables

ORDER BY [product id], [product version id] DESC;

DROP TABLE Original;

The expression now is as simple as

Sum(ProductCheck)

Hope that helps.

Miguel

View solution in original post

3 Replies
Miguel_Angel_Baeyens

Hi,

An expression like the following will work:

=Count(Aggr(Only(If(Aggr(FirstSortedValue(status, -[product version id]), [product id]) <> 'deleted', [product id])), [product id]))

However, it's quite tricky and depending on the number of rows your application is handling, it may take a long time to calculate. So, you rather get the value in the script, using Peek() and Previous() functions:

Original:

LOAD [product id],

     [product version id],

     status

FROM

Distinct.xls

(biff, embedded labels, table is Sheet1$);

Final:

LOAD *,

     If([product id] <> Previous([product id]), If(status = 'deleted', 0, 1)) AS ProductCheck

RESIDENT Original // do not use resident loads with large tables

ORDER BY [product id], [product version id] DESC;

DROP TABLE Original;

The expression now is as simple as

Sum(ProductCheck)

Hope that helps.

Miguel

Anonymous
Not applicable
Author

Hi,

I might have misunderstood your question but would something like this solve your problem?

 

Count({$<status={enabled},[product version id]={$(=Max([product version id]))>} DISTINCT [product id])

Not applicable
Author

niromo,

i think your solution will not be good enough since it compares the max version id, not taking underconsideration that each product has a different max version id.

miguel,

thank you for your reply. i'll give it a try and let you know what i come up with...