Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys,
Would you please suggest me a method to calculate average number of days between transactions (+ standard deviation) for my customers.
Here is the dataset (column datedifD is added just for information and is not actually present in datamodel):
custID | transID | transDate | transValue | datedifD |
1 | 1 | 10.03.2012 | 50 | |
1 | 2 | 24.03.2012 | 20 | 14 |
1 | 3 | 29.03.2012 | 10 | 5 |
1 | 4 | 12.04.2012 | 90 | 14 |
1 | 5 | 05.05.2012 | 90 | 23 |
1 | 6 | 09.05.2012 | 90 | 4 |
1 | 7 | 01.06.2012 | 90 | 23 |
1 | 8 | 26.06.2012 | 80 | 25 |
1 | 9 | 29.06.2012 | 100 | 3 |
1 | 10 | 16.07.2012 | 90 | 17 |
1 | 11 | 17.07.2012 | 20 | 1 |
1 | 12 | 06.08.2012 | 60 | 20 |
1 | 13 | 14.08.2012 | 50 | 8 |
1 | 14 | 11.09.2012 | 90 | 28 |
1 | 15 | 23.09.2012 | 70 | 12 |
2 | 1 | 22.08.2012 | 30 | |
2 | 2 | 01.09.2012 | 10 | 10 |
2 | 3 | 08.09.2012 | 20 | 7 |
2 | 4 | 30.09.2012 | 20 | 22 |
2 | 5 | 23.10.2012 | 30 | 23 |
2 | 6 | 01.11.2012 | 100 | 9 |
2 | 7 | 17.11.2012 | 30 | 16 |
2 | 8 | 22.11.2012 | 70 | 5 |
3 | 1 | 19.02.2010 | 50 | |
3 | 2 | 07.03.2010 | 100 | 16 |
3 | 3 | 02.04.2010 | 40 | 26 |
3 | 4 | 22.04.2010 | 50 | 20 |
3 | 5 | 11.05.2010 | 50 | 19 |
4 | 1 | 03.01.2012 | 100 | |
4 | 2 | 11.01.2012 | 100 | 8 |
4 | 3 | 14.01.2012 | 40 | 3 |
4 | 4 | 04.02.2012 | 50 | 21 |
4 | 5 | 08.02.2012 | 40 | 4 |
4 | 6 | 14.02.2012 | 90 | 6 |
4 | 7 | 15.02.2012 | 90 | 1 |
4 | 8 | 18.02.2012 | 90 | 3 |
4 | 9 | 07.03.2012 | 30 | 18 |
4 | 10 | 24.03.2012 | 20 | 17 |
4 | 11 | 14.04.2012 | 40 | 21 |
4 | 12 | 01.05.2012 | 20 | 17 |
4 | 13 | 04.05.2012 | 90 | 3 |
4 | 14 | 21.05.2012 | 10 | 17 |
4 | 15 | 05.06.2012 | 80 | 15 |
4 | 16 | 04.07.2012 | 10 | 29 |
4 | 17 | 17.07.2012 | 100 | 13 |
4 | 18 | 12.08.2012 | 20 | 26 |
4 | 19 | 02.09.2012 | 90 | 21 |
4 | 20 | 08.09.2012 | 20 | 6 |
4 | 21 | 01.10.2012 | 20 | 23 |
5 | 1 | 04.02.2011 | 50 | |
5 | 2 | 17.02.2011 | 80 | 13 |
5 | 3 | 04.03.2011 | 90 | 15 |
5 | 4 | 20.03.2011 | 10 | 16 |
5 | 5 | 09.04.2011 | 30 | 20 |
5 | 6 | 01.05.2011 | 10 | 22 |
5 | 7 | 12.05.2011 | 60 | 11 |
5 | 8 | 17.05.2011 | 70 | 5 |
5 | 9 | 24.05.2011 | 50 | 7 |
5 | 10 | 02.06.2011 | 90 | 9 |
5 | 11 | 21.06.2011 | 100 | 19 |
5 | 12 | 17.07.2011 | 40 | 26 |
5 | 13 | 08.08.2011 | 50 | 22 |
5 | 14 | 22.08.2011 | 80 | 14 |
5 | 15 | 06.09.2011 | 10 | 15 |
And here is the desired output
custID | average time (d) between transactions | standard deviation |
1 | 14.07 | 8.92 |
2 | 13.14 | 7.24 |
3 | 20.25 | 4.19 |
4 | 13.60 | 8.69 |
5 | 15.29 | 6.08 |
Thanks a lot in advance
Dmitry
P.S. Forgot to mention, the data already loaded. I cannot make changes in the load script.
Message was edited by: excelgod
Instead of above expression, you could also use
=avg(aggr( transDate -above(transDate),custID,transID))
[NOTE: to get correct results, the load order of transID must be ascending, i.e. your records mustn't come in all mixed up. In your sample data, that looks ok to me]
Then the standard deviation could just be
=stdev(aggr( transDate -above(transDate),custID,transID))
Here you go:
LOAD custID,
avg(datedifD) as datediffAvg
FROM
transactions.xls
(biff, embedded labels, table is values$)
group by custID;
Thanks for reply. I cannot change loading scripts. I should only play with formylas in charts + datedifD is not present in QV, I showed it for information only.
You can create a straight table with dimension custID and as expression:
=(max(transDate)-min(transDate)) / (max(transID)-1)
Wow.Thanks.What an elegant solution.It really solves the problem with the mean...
I also need the standard deviation (didn't put it to keep example simpler, my mistake). Here I absolutely need to calculate difference between dates of consecutive transactions, and later to calculate stdev for these differences.
Instead of above expression, you could also use
=avg(aggr( transDate -above(transDate),custID,transID))
[NOTE: to get correct results, the load order of transID must be ascending, i.e. your records mustn't come in all mixed up. In your sample data, that looks ok to me]
Then the standard deviation could just be
=stdev(aggr( transDate -above(transDate),custID,transID))
thanks I'll check immediately
you are supeman. works perfectly
is there a way to sort transaction date on the fly (in the formula) in case there is no order or the order is not suitable?
Unfortunately, you can't define a sort order in the advanced aggregation function aggr(). I believe it will always use the load order for sorting its dimension values.