Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
jstensig
Contributor III
Contributor III

Calculate "Rolling average" in script

Hi,

I´m doing some investigation on our Purchase-transactiondata and need to calculate a 3 month Rolling average per ProductID to identify "bad data" / outliers.

The reason for trying to do it in the load script, is 1) that I have millions of records and 2) hoping to do some automation in the process of handling the "bad data" / outliers.

I know that I can do something similar in Chart expressions, but finding it very hard to identify and handle "bad data" / outliers using Charts = a lot of manual work

Any ideas out there?

5 Replies
martinpohl
Partner - Master
Partner - Master

Hello,

I would suggest following structur:

load

sum(value) as Sumvalue

year/period

ProductID

resident your data group by year/period, ProductID where not bad data

then

give your year/period a number, oder by year / period

eg 201101=1, 201102 = 2, 201111=11, 201112=12, 201201=13, 201202=14 and so on.

Then calculate in a loop

sum(Sumvalue)

year/period

ProductID

resident Sumtable group by year/period, ProductID where periodnumber <= loop -3

Regards

lironbaram
Partner - Master III
Partner - Master III

hi check the attach example

jstensig
Contributor III
Contributor III
Author

Hi both,

Interesting - right now I´m having a look at your suggestions and will get back later, hopefully today.

Thanks for sharing 🙂

jstensig
Contributor III
Contributor III
Author


Hi Martin,
You pointed me in the right direction - ended up with a 2-step solution using WHILE and IterNo() for ordinary load and afterwards aggregation


// Loading each purchase-line up to 3 times (WHILE-syntax) and "moving" DeliveryMonth (DeliveryMonth_new) accordingly
PurchaseLines_3mth:
LOAD ProductID,Amount,Cost
,AddMonths(DeliveryMonth,IterNo()-1) AS DeliveryMonth_new
Resident PurchaseLines
while AddMonths(AddMonths(DeliveryMonth,-2),IterNo()-1) <= DeliveryMonth


// Aggregate by ProductID,DeliveryMonth_new
PurchaseLines_aggr:
NoConcatenate
LOAD
ProductID,DeliveryMonth_new
,Sum(Amount) AS AmtTotal3mth,Sum(Cost) AS CostTotal3mth,Count(ProductID) AS LinesTotal3mth
Resident PurchaseLines_3mth
Group BY ProductID,DeliveryMonth_new;


Drop table PurchaseLines_3mth;


The aggregated data is joined back on transactiondata and used for further calculations

jstensig
Contributor III
Contributor III
Author

Hi liron,
Thanks for mentioning the RangeAvg-function - I can use that in further calculations in my project