Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
RoyBatty
Contributor III
Contributor III

Load only one record per user per month

Hi,

There is the following table:

[payments]:
Load * Inline
[
ID, USER_ID, AMOUNT, CREATED_AT
1, 1, 20, 2023-01-30 06:38:36
2, 2, 15, 2023-01-30 07:17:17
3, 1, 25, 2023-01-31 08:18:18
4, 3, 10, 2023-01-31 09:19:19
5, 3, 20, 2023-01-31 10:10:10
6, 1, 15, 2023-01-31 11:11:11

7, 1, 20, 2023-02-25 06:38:36
8, 2, 15, 2023-02-25 07:17:17
9, 1, 25, 2023-02-26 08:18:18
10, 1, 15, 2023-02-27 11:11:11

11, 2, 15, 2023-03-28 07:17:17
12, 3, 10, 2023-03-29 09:19:19
13, 3, 20, 2023-03-31 10:10:10
14, 3, 25, 2023-03-31 11:11:11
];

and it looks like this:

RoyBatty_0-1700082160229.png

 

From that table, we want to create a new table that will be identical but will have only one record for each user per month. It does not matter whether it is the first or the last user's payment of the month, it is only important that new table contains only 1 user's payment in a certain month (if he had payment(s) in that month). For example, that table might look like this:

RoyBatty_1-1700082805539.png

 

Do you have any suggestions on how we can achieve this? 🤔

Labels (1)
1 Solution

Accepted Solutions
whiteymcaces
Partner - Creator
Partner - Creator

After your inline load script do this:

Temp1:
Load
Max(ID) as ID,
USER_ID,
Month(CREATED_AT) as Month
Resident payments
Group By Month(CREATED_AT), USER_ID
;
 
Left Join (Temp1)
Load
ID,
AMOUNT,
CREATED_AT
Resident payments
;
 
Drop Table payments;

View solution in original post

6 Replies
whiteymcaces
Partner - Creator
Partner - Creator

After your inline load script do this:

Temp1:
Load
Max(ID) as ID,
USER_ID,
Month(CREATED_AT) as Month
Resident payments
Group By Month(CREATED_AT), USER_ID
;
 
Left Join (Temp1)
Load
ID,
AMOUNT,
CREATED_AT
Resident payments
;
 
Drop Table payments;
sidhiq91
Specialist II
Specialist II

@RoyBatty  Please use the below code.

[payments]:
Load ID, USER_ID, AMOUNT,
Date(Monthstart(Date(CREATED_AT,'MM/DD/YYYY')),'MM/DD/YYYY') as Date,
AutoNumber(USER_ID&Date(Monthstart(Date(CREATED_AT,'MM/DD/YYYY')),'MM/DD/YYYY')) as Key

Inline
[
ID, USER_ID, AMOUNT, CREATED_AT
1, 1, 20, 2023-01-30 06:38:36
2, 2, 15, 2023-01-30 07:17:17
3, 1, 25, 2023-01-31 08:18:18
4, 3, 10, 2023-01-31 09:19:19
5, 3, 20, 2023-01-31 10:10:10
6, 1, 15, 2023-01-31 11:11:11

7, 1, 20, 2023-02-25 06:38:36
8, 2, 15, 2023-02-25 07:17:17
9, 1, 25, 2023-02-26 08:18:18
10, 1, 15, 2023-02-27 11:11:11

11, 2, 15, 2023-03-28 07:17:17
12, 3, 10, 2023-03-29 09:19:19
13, 3, 20, 2023-03-31 10:10:10
14, 3, 25, 2023-03-31 11:11:11
];

NoConcatenate
Temp1:
Load distinct Key,
USER_ID,Date
resident [payments];

Drop table [payments];

Exit Script;

Let me know if it resolves your issue.

RoyBatty
Contributor III
Contributor III
Author

Thanks!

If I add other fields to the "Temp1" table (ID, AMOUNT, CREATED_AT):

NoConcatenate
Temp1:
Load distinct Key,
ID, //
USER_ID,
AMOUNT, //
CREATED_AT, //
Date
resident [payments];

then it will  load all 14 records again:

RoyBatty_0-1700132881430.png

 

(we need these fields in that new table).

Thanks again!

RoyBatty
Contributor III
Contributor III
Author

Thanks!

Your solution works, but I haven't tested it with real data yet. A possible problem is that the real "payments" table actually has 10 million records, so I wonder if "Left Join" can be problematic in that case.

@sidhiq91  offered another solution without "Left Join", but it has a limitation and cannot have all the same columns as the "payments" table. Anyway, I'll be testing all that soon, and then I'll accept solution.

Thanks again!

RoyBatty
Contributor III
Contributor III
Author

Hi @whiteymcaces

When there are records from different years, then it will not work properly because the "Month()" function is used and grouping is done by month without year. For example, if the last 4 rows are from a different year:

 

[payments]:
Load * Inline
[
ID, USER_ID, AMOUNT, CREATED_AT
1, 1, 20, 2023-01-30 06:38:36
2, 2, 15, 2023-01-30 07:17:17
3, 1, 25, 2023-01-31 08:18:18
4, 3, 10, 2023-01-31 09:19:19
5, 3, 20, 2023-01-31 10:10:10
6, 1, 15, 2023-01-31 11:11:11

7, 1, 20, 2023-02-25 06:38:36
8, 2, 15, 2023-02-25 07:17:17
9, 1, 25, 2023-02-26 08:18:18
10, 1, 15, 2023-02-27 11:11:11

11, 2, 15, 2024-01-28 07:17:17
12, 3, 10, 2024-01-29 09:19:19
13, 3, 20, 2024-01-31 10:10:10
14, 3, 25, 2024-01-31 11:11:11
];

 it will give the following result:

RoyBatty_0-1700177951807.png

 

So I just added &' ' & Year(CREATED_AT):

Temp1:
Load
Max(ID) as ID,
USER_ID,
Month(CREATED_AT) &' ' & Year(CREATED_AT) as Month
Resident payments
Group By Month(CREATED_AT) &' ' & Year(CREATED_AT), USER_ID
;

Left Join (Temp1)
Load
ID,
AMOUNT,
CREATED_AT
Resident payments
;

Drop Table payments;

and it works fine:

RoyBatty_1-1700178563682.png

 

whiteymcaces
Partner - Creator
Partner - Creator

You could also use the MonthStart function.

i.e. 

Group By MonthStart(CREATED_AT), USER_ID