Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Community,
I have an issue with a group by. I would like to group by the following:
The maximum date of the underlying case is 2018-06-29. I want to group and add a column by the maximum date with the latest status. So in this case, the maximum date is 2018-06-29 and the latest status should be 2.
Example:
LOAD * INLINE [
Client, Statusnr, Date1, MaxDate,
A, 1, 2018-05-26, 2018-06-29
A, 3, 2018-05-29, 2018-06-29
A, 2, 2018-06-29, 2018-06-29
] (delimiter is ',');
Ideal result:
LOAD * INLINE [
Client, Statusnr, Date1, MaxDate, LatestStatus
A, 1, 2018-05-26, 2018-06-29, 2
A, 3, 2018-05-29, 2018-06-29, 2
A, 2, 2018-06-29, 2018-06-29, 2
] (delimiter is ',');
Thank you in advance
Hi,
Look at the FirstSortedValue function. In your case it will be something like this:
T:
LOAD *, 1 AS Dummy INLINE [
Client, Statusnr, Date1, MaxDate
A, 1, 2018-05-26, 2018-06-29
B, 3, 2018-05-29, 2018-06-29
C, 2, 2018-06-29, 2018-06-29
] (delimiter is ',');
LEFT JOIN (T)
LOAD Dummy,
FirstSortedValue(Statusnr, -Date1) AS LatestStatus
RESIDENT T
GROUP BY Dummy
;
DROP FIELD Dummy;
Regards,
David
Hi,
Look at the FirstSortedValue function. In your case it will be something like this:
T:
LOAD *, 1 AS Dummy INLINE [
Client, Statusnr, Date1, MaxDate
A, 1, 2018-05-26, 2018-06-29
B, 3, 2018-05-29, 2018-06-29
C, 2, 2018-06-29, 2018-06-29
] (delimiter is ',');
LEFT JOIN (T)
LOAD Dummy,
FirstSortedValue(Statusnr, -Date1) AS LatestStatus
RESIDENT T
GROUP BY Dummy
;
DROP FIELD Dummy;
Regards,
David
Or just this
T:
LOAD * INLINE [
Client, Statusnr, Date1, MaxDate
A, 1, 2018-05-26, 2018-06-29
B, 3, 2018-05-29, 2018-06-29
C, 2, 2018-06-29, 2018-06-29
];
Join (T)
LOAD FirstSortedValue(Statusnr, -Date1) AS LatestStatus
Resident T;
Thanks David,
What is the purpose of the Dummy? Does this minimize the datasize?
Thanks in advance.
I don't think Dummy is doing anything... if you look at the script below... it essentially gives the same result without needing Dummy field.
Thanks Sunny! I thought it would minimize the database size. During a group by my data load time increases. Haven't found any solution yet.
'Dummy' is used in this case to group the whole data set by ('FirstSoredValue' is an aggregation function) and also as a key to link the result to all corresponding values (all in this case). If you have multiple entries for each client and you want to determine the status at that level, you can use 'Client' field instead.
Regards,
David
Plz try this
T:LOAD * INLINE [
Client, Statusnr, Date1
A, 1, 2018-05-26
A, 3, 2018-05-29
A, 2, 2018-06-29];
F:
load Client, Statusnr, Date(Date#(Date1,'YYYY-MM-DD'),'DD-MM-YYYY') as Date
Resident T;
drop table T;
inner join (F)
load FirstSortedValue(Statusnr,-Date) as Latest_Status,date(max(Date)) as Max_Date
Resident F;
Here is one where you don't really need aggregation
T:
LOAD Client,
Statusnr,
Date(Date1) as Date1;
LOAD * INLINE [
Client, Statusnr, Date1
A, 1, 2018-05-26
B, 3, 2018-05-29
C, 2, 2018-06-29
];
MaxDate:
LOAD Max(MaxDate) as MaxDate;
LOAD FieldValue('Date1', RecNo()) as MaxDate
AutoGenerate FieldValueCount('Date1');
LET vMaxDate = Peek('MaxDate');
Join (T)
LOAD Statusnr AS LatestStatus
Resident T
Where Date1 = $(vMaxDate);
Thanks for the explanation.