Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
Mitch_Data
Contributor III
Contributor III

Set Analysis Count + Average

Hi Guys,

 

I have a question regarding a set analysis.

What I want: I want to calculate the average result where category value <=20000 and only when the distinct count exceeds >25. So I need to have 25 or more contracts present with a category value <=20000 before I calculate the average result. However, below formula doesn't seem to work.

 

This is what I have:

 

Avg({<CategoryValue={"<=20000"}, Contractnumber= {"=Count(Contractnumber) >=25"}>} Result)

 

Could anyone provide guidance, many thanks!

Labels (3)
3 Replies
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @Mitch_Data 

Unfortunately Set Analysis doesn't work like what you are trying. There is a function P() that you can use in Set Analysis that almost does what you want, but doesn't quite.

I think that you are going to want an AGGR function to work out which contract numbers results you are interested in. Something like:

avg(aggr(if(count({<CategoryValue={"<=20000"}>}Contractnumber) >= 25, sum(Result), null()), Contractnumber))

This will work out the total of all Results for each Contractnumber, where there are more than 25 occurrences of that Contractnumber and then average them.

Note that it will be the average per Contractnumber, not unique line.

Note also that the Count is slightly ambiguous here, and if you have a unique TransactionID you will be better off doing a COUNT(DISTINCT of that, rather than counting non unique values. Better still add a counter to each row and sum that (i.e. 1 as TransCount, in the load script).

If you want the average of all results then you probably need to get a comma separated list of contract numbers to feed into the set analysis. Something like:

avg({<Contractnumber={$(=concat(distinct aggr(if(count({<CategoryValue={"<=20000"}>}Contractnumber) >= 25, Contractnumber, null()), Contractnumber), ','))}>}Result)

I've almost certainly not got the brackets right there, but it may give you the components you need to put this together. It will only work if Contractnumber is a number, otherwise you will need to use chr(39) to build out the string of numbers for the set analysis.

Test the part within the $() on it's own to ensure that gives the list of numbers how we want it, just use the title of a chart or a text box to test just that part, i.e.

concat(distinct aggr(if(count({<CategoryValue={"<=20000"}>}Contractnumber) >= 25, Contractnumber, null()), Contractnumber), ',')

Hope that all makes some kind of sense?

Steve

Mitch_Data
Contributor III
Contributor III
Author

Hi Steve,

 

This is not exactly what I want, I try to explain it more thoroughly,  perhaps I was too vague :):

Let's use brand instead of contractnumber (brand = the brand of the car e.g. Mercedes), this might make it easier to understand.

I have brand as a dimension in my bar graph, now for the measure I want this:

I want the average result per brand where the CategoryValue <=20000 and that brand needs to represented more than 25 times in my dataset (within that categoryvalue of <=20000).

Basically only average if the CategoryValue <=20000 and the brand occurs more than 25 times within that categorygroup. So I do not want to use distinct because Mercedes will for instance occur many times but I want to count those instances.

Steve, in any case thank you for the feedback I hope you can provide me with some insights!

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @Mitch_Data 

Okay, so I think then what I had in the second example is probably along the right lines.

Assuming Mercedes, BMW and Volkswagen are the three brands that appear more than 25 times within the category, you would want the following set analysis statement:

avg({<Brand={'Mercedes','BMW','Volkswagen'}>}Result)

That's nice and simple!

The tricky part is that we need to build that list of brands in code, because new brands could pass the 25 threshold from one reload to the next.

I would suggest using a variable, to split things up a bit. So, that code becomes:

avg({<Brand={$(v25PlusBrands)}>}Result)

And then we just need to get the variable right.

I would get this working in the Title of a chart (the easiest place to test expressions) first. The code I had was broadly right, but you need to add the quotations and I've changed the count(Brand) for sum(1), which should give the same value - but sum(1) may be easier to read:

=chr(39) & concat(distinct aggr(if(sum({<CategoryValue={"<=20000"}>}1) > 25, Brand, null()), Brand), chr(39) & ',' & chr(39)) & chr(39)

So, to break that down, you have a single quote, a concatenation of each brand with ',' in between each and then a single quote on the end.

If you want this expression to ignore selections you will want to put a in the set analysis, so it reads {1<.

Once it is working, place that code into the v25PlusBrands variable and off you go.

Good luck!

Steve