3 Replies Latest reply: Apr 25, 2014 8:59 AM by Mike Swinn

# Median of Aggregate to Consider Nulls/Missing Data

I am looking to produce a chart showing median sales by customer for each month since their signup. My data is at transaction level and I have coded a field for the number of months within the load script.

E.g.

Customer          months_since_signup          sale_value

A                         1                                             5.50

A                         1                                             4.30

A                         1                                             6.50

B                         1                                             3.60

A                         2                                             5.40

A                         4                                             6.50

B                         5                                             7.00

I started by setting months_since_sale as a dimension and an expression similar to the following:

median(aggr(sum(sale_value, customer_ID)))

This does give median sales but only across customers who have a sale in a given month. It does not consider customers who have no sales since there are no data points in the first place. The result is therefore meaningless since customers who drop off over time are ignored.

So for example if I have 5 customers, with sales in their second month of:

5, 4, 3, (nothing), (nothing)

Then the expression would produce a 'median' of 4 since it only considered the first three values. I want it to take the median of

5, 4, 3, 0, 0

...and produce 3 as the result.

What I'd like to do is somehow use an intermediate table which contains a record for every customer in every month, ideally within my expression rather than data load. I'm not sure this is possible with the aggr() function.

The mean is simple by working it differently as:

sum(sale_value) / count(distinct total customer_ID)

But I can't use this method for median.

Any ideas? I'm thinking a detatched dimension may be required but I want to keep it as simple as possible.

• ###### Re: Median of Aggregate to Consider Nulls/Missing Data

If your customer master table has all these (5,4,3,2,1) customers and no sales for 2,1 in transaction table, your Aggr() approach should work. See the attachment.

• ###### Re: Median of Aggregate to Consider Nulls/Missing Data

No it should work. All you need to do is to use the total qualifier for the Median function. See the last column of the pivot table below.

HIC

• ###### Re: Median of Aggregate to Consider Nulls/Missing Data

Thanks for the replies, I didn't explain clearly what I wanted as the output though, and my aggr function was slightly wrong, it should have also contained month.

I need the median sales per customer for each month, I have tried the TOTAL method but it doesn't consider the missing customers, i.e. in months 1 and 3 only two of the three customers have sales, and the median is computed across only these two.

See the attached example and excel data files (M is month). The second table/file ('Median Aggr with Zeros') gives the output I want but only because I have manually added a row with a sales value of 0 for months in which customers have sales. In that Excel file I have highlighted the points which represent the median in each month.

I'd like to do achieve the result of the second table the same in qlikview but without adding these zeros to the raw data - perhaps via some kind of pivot function or similar?