Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I've got the following table:
deposits_table:
NoConcatenate LOAD
TransactionId,
Date as Timestamp,
Date(DayStart("Date"), 'DD/MM/YYYY') as Date,
TransactionType,
Amount,
CUID
The timestamp is in msc. I would like to add 3 derived columns: daily_deposit_count, monthly_deposit_count, and yearly_deposit_count. The columns will iteratively count the nth transaction for a particular CUID per day/month/year. That is, Let's say a customer deposits twice a day. Their first deposit for that day should have 1 in the daily_deposit_count column and the second deposit should have 2 and then the counter reset for when they day/month/year rolls over. How do I achieve this?
@qasim_00 Instead you can try with peek and previous functions
table_name:
LOAD
*,
if(CUID<>previous(CUID),1,rangesum(peek('DailyTransactionCount'),1)) as DailyTransactionCount
Resident deposits_table
Order By CUID,Date,TimeStamp;
I suspect the AGGR function is what you need here. I'd need to have the data to play about but something like
table_name:
LOAD
TransactionId,
Date as Timestamp,
Date(DayStart("Date"), 'DD/MM/YYYY') as Date,
TransactionType,
Amount,
CUID,
Aggr(Count(TransactionId), CUID, Date) as daily_deposit_count,
Aggr(Count(TransactionId), CUID, Year(Date), Month(Date)) as monthly_deposit_count,
Aggr(Count(TransactionId), CUID, Year(Date)) as yearly_deposit_count
Resident deposits_table
Order By CUID, Date, TransactionId;
@qasim_00 Could you share data example to understand bit more?
Here's some dummy data:
| TransactionId | Timestamp | Date | TransactionType | Amount | CUID |
| 1001 | 45005.496818| 20/03/2023 | Deposit | 899.88 | CUID001 |
| 1002 | 45005.067251| 20/03/2023 | Deposit | 904.37 | CUID001 |
| 1003 | 44998.281421| 13/03/2023 | Deposit | 155.26 | CUID001 |
| 1004 | 44998.433927| 15/03/2023 | Deposit | 175.99 | CUID001 |
| 1005 | 44997.663445| 13/03/2023 | Deposit | 460.27 | CUID001 |
| 1006 | 45011.509363| 28/03/2023 | Deposit | 757.63 | CUID002 |
| 1007 | 45011.492744| 27/03/2023 | Deposit | 209.76 | CUID002 |
| 1008 | 45011.158855| 28/03/2023 | Deposit | 624.74 | CUID002 |
| 1009 | 45003.978103| 20/03/2023 | Deposit | 144.56 | CUID002 |
| 1010 | 44997.531693| 13/03/2023 | Deposit | 549.68 | CUID002 |
I think the Aggr function is not allowed in a load statement?
@qasim_00 Replace Aggr with Window function then try.
table_name:
LOAD
*,
AutoNumber([deposits_table.Timestamp], deposits_table.Concatenated) as DailyTransactionCount
Resident deposits_table
Order By [deposits_table.Timestamp]
Worked with this approach but I didn't know how to get the AutoNumber function to count on 2 grouped columns so I just concatenated their values:
Date(DayStart("Date"), 'DD/MM/YYYY') & CUID as Concatenated
Although this method does take some time to process (around 4 minutes for 80,000 rows).
Also, if someone's aware of a method to group column values for the AutoNumber function then please comment.
Note: I tried the above approach provided by KJM but it gave me the total count for each group instead of an incremental count.
@qasim_00 Instead you can try with peek and previous functions
table_name:
LOAD
*,
if(CUID<>previous(CUID),1,rangesum(peek('DailyTransactionCount'),1)) as DailyTransactionCount
Resident deposits_table
Order By CUID,Date,TimeStamp;
Thank you @Kushal_Chawda! I had to append the clause to check previous date as well.
table_name:
LOAD
*,
if(deposits_table.CUID <> previous([deposits_table.CUID]) or Date <> previous(Date),1,rangesum(peek('deposits_counts_table.DailyTransactionCount'),1)) as DailyTransactionCount
Resident deposits_table
Order By deposits_table.CUID, Date, deposits_table.Timestamp;
Works much faster than my solution!