Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
pjonathon
Contributor
Contributor

Set Intersection

Having trouble understanding and implementing set intersection. I think I am following the correct syntax but the outcome is different to what I expect.

What I am trying to implement is the following: (A or B) AND (C or D)

I only want to count items that were sold in Region A or B and region C or D

Here is my load script:

Sales:
Load * inline

[

Item,Region
Item1,A
Item2,B
Item3,C
Item4,D
Item1,D
Item2,C
];

Here is my formula: count(distinct{<Region={'A','B'},Region={'C','D'}>} Item)

I expect to get a count of 2, since Item1 was sold in region A and D and Item2 was sold in region B and C

Does anyone have an idea on what is wrong with my formula?

 

Labels (1)
1 Solution

Accepted Solutions
marcus_sommer

I think this kind of intersection isn't solvable with a single aggregation because Region could have only one value on a record-level and therefore each AND condition must fail. But within a chart with Item as dimension you could use something like this:

count({< Region = {'A', 'B'}>} Region) * count({< Region = {'C', 'D'}>} Region)

and should it be counted within a KPI it might be wrapped with an aggr() like:

sum(aggr(count({< Region = {'A', 'B'}>} Region) * count({< Region = {'C', 'D'}>} Region), Item))

- Marcus

View solution in original post

7 Replies
Anil_Babu_Samineni

I know your intend, but you can simplify like below if output expected same

Count({<item={"= count(item)>1">} item)

Or, your way may be

Count({<region={'A', 'B'}>+<region={'C', 'D'>} item)

Or, this also optimised if you have many records

Count(if((region='A' or region='B') and (region='C' or region='D'), item))

Best Anil, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful
rogerpegler
Creator II
Creator II

Try:

count(distinct{<Region={'A','B'}>*<Region={'C','D'}>} Item)

The '*' gives you an intersection between each set defined by <..>. 

pjonathon
Contributor
Contributor
Author

The first formula count(distinct{<Region={'A','B'}>+<Region={'C','D'}>} Item) results in 6. Which is not what I am looking for.  Since "+" represents Union I see why I get 6. I am looking for Intersection.

Count(if( (Region='A' or Region='B') and (Region='C' or Region='D'), Item)) results in 0

 

pjonathon
Contributor
Contributor
Author

count(distinct{<Region={'A','B'}>*<Region={'C','D'}>} Item) results in 0

Anil_Babu_Samineni

Can you please post reality data. I didn't understand the logic from sample data

Best Anil, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful
marcus_sommer

I think this kind of intersection isn't solvable with a single aggregation because Region could have only one value on a record-level and therefore each AND condition must fail. But within a chart with Item as dimension you could use something like this:

count({< Region = {'A', 'B'}>} Region) * count({< Region = {'C', 'D'}>} Region)

and should it be counted within a KPI it might be wrapped with an aggr() like:

sum(aggr(count({< Region = {'A', 'B'}>} Region) * count({< Region = {'C', 'D'}>} Region), Item))

- Marcus

pjonathon
Contributor
Contributor
Author

This worked. I am getting the expected results. Explanation was right on - reading the aggr function, it makes sense.