Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Calculating average time between transactions + stdev for each customer

Hi guys,

Would you please suggest me a method to calculate average  number of days between transactions (+ standard deviation) for my customers.

Here is the dataset (column datedifD is added just for information and is not actually present in datamodel):

custIDtransIDtransDatetransValuedatedifD
1110.03.201250
1224.03.20122014
1329.03.2012105
1412.04.20129014
1505.05.20129023
1609.05.2012904
1701.06.20129023
1826.06.20128025
1929.06.20121003
11016.07.20129017
11117.07.2012201
11206.08.20126020
11314.08.2012508
11411.09.20129028
11523.09.20127012
2122.08.201230
2201.09.20121010
2308.09.2012207
2430.09.20122022
2523.10.20123023
2601.11.20121009
2717.11.20123016
2822.11.2012705
3119.02.201050
3207.03.201010016
3302.04.20104026
3422.04.20105020
3511.05.20105019
4103.01.2012100
4211.01.20121008
4314.01.2012403
4404.02.20125021
4508.02.2012404
4614.02.2012906
4715.02.2012901
4818.02.2012903
4907.03.20123018
41024.03.20122017
41114.04.20124021
41201.05.20122017
41304.05.2012903
41421.05.20121017
41505.06.20128015
41604.07.20121029
41717.07.201210013
41812.08.20122026
41902.09.20129021
42008.09.2012206
42101.10.20122023
5104.02.201150
5217.02.20118013
5304.03.20119015
5420.03.20111016
5509.04.20113020
5601.05.20111022
5712.05.20116011
5817.05.2011705
5924.05.2011507
51002.06.2011909
51121.06.201110019
51217.07.20114026
51308.08.20115022
51422.08.20118014
51506.09.20111015

And here is the desired output

custIDaverage time (d) between transactionsstandard deviation
114.07                            8.92
213.14                            7.24
320.25                            4.19
413.60                            8.69
515.29                            6.08

Thanks a lot in advance

Dmitry

P.S. Forgot to mention, the data already loaded. I cannot make changes in the load script.

Message was edited by: excelgod

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Instead of above expression, you could also use

=avg(aggr( transDate -above(transDate),custID,transID))

[NOTE: to get correct results, the load order of transID must be ascending, i.e. your records mustn't come in all mixed up. In your sample data, that looks ok to me]

Then the standard deviation could just be

=stdev(aggr( transDate -above(transDate),custID,transID))

View solution in original post

11 Replies
Anonymous
Not applicable
Author

Here you go:

LOAD custID,

     avg(datedifD) as datediffAvg

FROM

transactions.xls

(biff, embedded labels, table is values$)

group by custID;

Not applicable
Author

Thanks for reply. I cannot change loading scripts. I should only play with formylas in charts + datedifD is not present in QV, I showed it for information only.

swuehl
MVP
MVP

You can create a straight table with dimension custID and as expression:

=(max(transDate)-min(transDate)) / (max(transID)-1)

Not applicable
Author

Wow.Thanks.What an elegant solution.It really solves the problem with the mean...

I also need the standard deviation (didn't put it to keep example simpler, my mistake). Here I absolutely need to calculate difference between dates of consecutive transactions, and later to calculate stdev for these differences.

swuehl
MVP
MVP

Instead of above expression, you could also use

=avg(aggr( transDate -above(transDate),custID,transID))

[NOTE: to get correct results, the load order of transID must be ascending, i.e. your records mustn't come in all mixed up. In your sample data, that looks ok to me]

Then the standard deviation could just be

=stdev(aggr( transDate -above(transDate),custID,transID))

Not applicable
Author

thanks I'll check immediately

Not applicable
Author

you are supeman. works perfectly

Not applicable
Author

is there a way to sort transaction date on the fly (in the formula) in case there is no order or the order is not suitable?

swuehl
MVP
MVP

Unfortunately, you can't define a sort order in the advanced aggregation function aggr(). I believe it will always use the load order for sorting its dimension values.