Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Aggregate data then averaging techniques

Hi,

I need to aggregate my data to decrease the size of it, but I also need to calculate the average of these aggregated values afterwards. If you aggregate the data first, then calculate the average afterwards (aggregating an aggregation of a value) then the average will be different (sometimes very different) compared to averaging the unaggregated data.

I was wondering if anyone has any suggestions or links to some resources that discuss techniques that can be used to decrease the variance between the two averages when you have to aggregate data? My maths and statistics knowledge isn't great. I assume some kind of weighted averaging might help this, or maybe there are more advanced techniques. Thanks.

3 Replies
tresesco
MVP
MVP

Say, you want to aggregate 'Amount' monthly instead of detailed date wise data, and you need the average also. then what i would do is - simply aggregate the data monthly with two operations SUM and AVG at the time of aggregation simulteneously during load. Like:

LOAD Month, Sum(Amount) as SumofAmount, Avg(Amount) as AmountAvegare

FROM.....Group By Month;

Regards, tresesco

Not applicable
Author

An AVG aggregation does get you closer than SUM but it is still pretty far out. You can see from this very basic example attached the differences when just using simple AVG or SUM aggregation then averaging the result.

tresesco
MVP
MVP

This is obvious. Sum and Average can't be carried together with aggregation again and again. Because the aggregation entity-count varries in different level. I did not understand your requirement properly. In that case (as you want), i believe you have to mantain a count(rather DISTINCT COUNT) for dimension, or you can do that at the front end. The Total amount would not varry for any case of aggregation. So you try with customed formula (for average) like : SUM(Amount)/Count(Distinct YourDimension).

See, if it could be an option.

Regards, tresesco