Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I'm having this sample data:
object_id, update_time, minVal , maxVal
31000001 2020-05-18 09:20:21 1.56 20.71
31000003 2020-05-18 09:20:21 3.33 16.42
31000005 2020-05-18 09:20:21 2.00 20.00
31000005 2020-05-18 10:20:21 3.00 25.00
I'd like to remove the duplicate row of object_id = 31000005 and have only one record for this id, based on the latest timestamp:
object_id, update_time, minVal , maxVal
31000001 2020-05-18 09:20:21 1.56 20.71
31000003 2020-05-18 09:20:21 3.33 16.42
31000005 2020-05-18 10:20:21 3.00 25.00
How can I achieve this result?
Thanks 🙂
You can also try
SET TimestampFormat='YYYY-MM-DD hh:mm:ss';
Table:
LOAD * INLINE [
object_id, update_time, minVal, maxVal
31000001, 2020-05-18 09:20:21, 1.56, 20.71
31000003, 2020-05-18 09:20:21, 3.33, 16.42
31000005, 2020-05-18 09:20:21, 2.00, 20.00
31000005, 2020-05-18 10:20:21, 3.00, 25.00
];
Right Join (Table)
LOAD object_id,
Max(update_time) as update_time
Resident Table
Group By object_id;
Maye be :
Data:
load rowNo() as ID,* inline [
object_id, update_time, minVal , maxVal
31000001 ,2020-05-18 09:20:21, 1.56 ,20.71
31000003 ,2020-05-18 09:20:21, 3.33 ,16.42
31000005 ,2020-05-18 09:20:21, 2.00, 20.00
31000005, 2020-05-18 10:20:21 ,3.00, 25.00
];
output:
load object_id,FirstSortedValue(ID,-Timestamp#(update_time,'YYYY-MM-DD hh:mm:ss')) as ID resident Data group by object_id;
left join load * resident Data;
drop table Data;
drop fields ID;
You can also try
SET TimestampFormat='YYYY-MM-DD hh:mm:ss';
Table:
LOAD * INLINE [
object_id, update_time, minVal, maxVal
31000001, 2020-05-18 09:20:21, 1.56, 20.71
31000003, 2020-05-18 09:20:21, 3.33, 16.42
31000005, 2020-05-18 09:20:21, 2.00, 20.00
31000005, 2020-05-18 10:20:21, 3.00, 25.00
];
Right Join (Table)
LOAD object_id,
Max(update_time) as update_time
Resident Table
Group By object_id;
@sunny_talwar Thanks, it did the trick.