Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a table:
LOAD * INLINE [ Onum, CID, Date
516, 1, 2020-10-05
557, 2, 2021-04-09
667, 3, 2020-11-15
788, 3, 2021-02-08
556, 2, 2020-10-31
556, 3, 2021-08-09
333, 1, 2021-09-09
444, 1, 2020-10-01
];
I am attempting to do a cohort analysis, so I am trying to create a column with the cohort data (Year-Month they first purchased ), and rebills for every `CID`.
For the cohort, the logic would be something like
Psuedo-code
cohort_column = table.groupby('CID')['Date'] \
.transform('min') \
.to_period(Y-M)
I want to add two additional columns: One with the cohort date and one with gives the purchase count with respect to the date. So my table should look like:
LOAD * INLINE [
Onum, CID, Date, cohort, count
516, 1, 2020-10-05, 2020-10, 2
333, 1, 2021-09-09, 2020-10, 3
444, 1, 2020-10-01, 2020-10, 1
557, 2, 2021-04-09, 2020-09, 2
556, 2, 2020-09-31, 2020-09, 1
667, 3, 2020-11-15, 2020-11, 1
788, 3, 2021-02-08, 2020-11, 2
556, 3, 2021-08-09, 2020-11, 3 ];
If I understand your question, try with this script
t:
LOAD Onum, CID, Date(Date#(Date2, 'YYYY-MM-DD')) as Date2 INLINE [
Onum, CID, Date2, cohort, count
516, 1, 2020-10-05, 2020-10, 2
333, 1, 2021-09-09, 2020-10, 3
444, 1, 2020-10-01, 2020-10, 1
557, 2, 2021-04-09, 2020-09, 2
556, 2, 2020-09-30, 2020-09, 1
667, 3, 2020-11-15, 2020-11, 1
788, 3, 2021-02-08, 2020-11, 2
556, 3, 2021-08-09, 2020-11, 3
];
// calc cohort with a group by
LEFT JOIN (t)
LOAD
CID,
Date(min(Date2), 'YYYY-MM') as cohort
Resident t
GROUP BY CID;
// calc count with a resident load order by
t3:
LOAD
*,
IF(cohort = Previous(cohort), peek('count')+1, 1) as count
Resident t
ORDER BY cohort, Date2;
DROP Table t;
Thanks a lot for the response @maxgro — I will attempt to implement your solution and either follow up with you/mark it as a working solution sometime today.
Hi @maxgro , the first part of your solution works well!
However, the code to calculate the counts might not be the solution I am looking for. I believe it's iterating as long as the previous cohort value is the same. However, in some instances users `CSID` have the same cohort so it may be attributing counts to other people. Is there anyway to change the criteria where your code applies only if its the same 'CSID'? Thanks.