Skip to main content
Announcements
Global Transformation Awards submissions are open! SUBMIT YOUR STORY
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Selecting one row from many based on the value in one column

Hi all, I need help with how to write a script.

I am sure this is really simple, but I am trying to select the only one row of data with multiple columns based on the value in one of the columns.

For example, I have the following data

ShipmentEvent TypeDate OccuredDate Submitted
12345Estimated deliveryJuly 1, 2013June 25, 2013
12345Estimated deliveryJune 30, 2013June 24, 2013
abcdeActual deliveryJune 14, 2013June 14, 2013
abcdeEstimated deliveryJune 14, 2013June 10, 2013

but I would like to get only the most recent date occured per shipment, per event type. see below.

ShipmentEvent TypeDate OccuredMost recent Date Submitted
12345Estimated DeliveryJuly 1, 2013June, 25, 2013
abcdeActual DeliveryJune 14, 2013June 14, 2013
abcdeEstimated DeliveryJune 14, 2013June 10, 2013

I thought I could do something like

New Table:

noconcatenate

load distinct

[Shipment],

[event type],

[Date occured],

max([date Submitted]) as [most recent date submitted]

resident old table

group by [shipment],[event type], [date occured];

however since the date occured is different, I get two rows... I cant guarantee that the date something occured is the larger or smaller. In other words the date submitted has nothing to do with when it was submitted or previous dates, it can go up or down. I just need to only capture the most recent one.

Any help would be appreciated.

Thanks

10 Replies
neetu_singh
Partner - Creator III
Partner - Creator III

Hi,

Try this code,

Test_Table:
Load * inline [
Shipment, Event_Type, Date_Occured, Date_Submitted
12345, Estimated Delivery, 01/07/2013, 25/06/2013
12345, Estimated Delivery, 30/06/2013, 24/06/2013
abcde, Actual Delivery, 14/06/2013, 14/06/2013
abcde, Estimated Delivery, 14/06/2013, 10/06/2013
];

inner join

load Shipment,
max(Date_Occured) as Date_Occured
resident Test_Table

group by Shipment;

Or you can find attached file.

Regards

Neetu Singh

Not applicable
Author

You can load the data finding only the max(date submitted) and then left join to find the date the event ocurred

New_Table:

noconcatenate

load distinct

[Shipment],

[event type]

max([date Submitted]) as [most recent date submitted]

resident old table

group by [shipment],[event type];

Left Join (New_Table)

LOAD    

[Shipment],

[event type],

[Date occured],

[date Submitted] as [most recent date submitted]

Resident old_table

However, you are going to link using the shipment, event type and date submitted, what can cause dupplicated results. So I suggest you use a Primary Key on your Old table ( f.i. RowNo()), so you can do a better join!

I hope it helps!

Luiz

Not applicable
Author

Thanks for the help Neetu, but unfortunately the source file I have is a few million lines long, thus the inline load is not an option.

Not applicable
Author

Hi Luiz,

I think I see what you mean. If I have a unique identifier in the original table (Row_Number) then I can then use it for my link when I left join the second table after I filter to just the max event on the first one. 

One problem, I still do not know how to sort on the first table I create and get only the maximum. I will have two unique row numbers, thus I will get two maximum values when I "group by row_number"... I will only want one one row_number that has the most recent submission.

Not applicable
Author

You are right, it's not going to work with the unique identifier.

What happens if you have the same date submitted for one shipment and event type?

F.i.

ShipmentEvent TypeDate OccuredDate Submitted
12345Estimated deliveryJuly 1, 2013June 25, 2013
12345Estimated deliveryJune 30, 2013June 25, 2013

Should you retrieve both lines if that occurs?

If you use Shipment, event type and date submitted to find the date occured, on your example (that doesn't have two equal date submitted for the same shipment and event type) you should get results like this

New_Table:

noconcatenate

load distinct

[Shipment],

[event type]

max([date Submitted]) as [most recent date submitted]

resident old table

group by [shipment],[event type];

ShipmentEvent TypeMost recent Date Submitted
12345Estimated deliveryJune 25, 2013
abcdeActual deliveryJune 14, 2013
abcdeEstimated deliveryJune 10, 2013

Left Join (New_Table)

LOAD   

[Shipment],

[event type],

[Date occured],

[date Submitted] as [most recent date submitted]

Resident old_table

ShipmentEvent TypeDate OccuredMost recent Date Submitted
12345Estimated DeliveryJuly 1, 2013June, 25, 2013
abcdeActual DeliveryJune 14, 2013June 14, 2013
abcdeEstimated DeliveryJune 14, 2013June 10, 2013


rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I think Neetu's solution is correct. She was using the inline table only to demonstrate. The load could be from a table or file instead.

-Rob

neetu_singh
Partner - Creator III
Partner - Creator III

Hi

I have used INline only for demo as i don't have the table but you can use table directly from database or Qvds also in place of Inline table as mention below:

Test_Table:
Load Shipment,

        Event_Type,

        Date_Occured,

        Date_Submitted
From "TableName"

inner join

load Shipment,
max(Date_Occured) as Date_Occured
resident Test_Table

group by Shipment;

Regards

Neetu Singh

er_mohit
Master II
Master II

Try this code

Data:

LOAD Shipment,[Event Type],

max(Date(DateSubmit,'MMMM DD, YYYY')) as MaxdateSubmit,

max(Date(DateOccur,'MMMM DD, YYYY')) as MaxDateOccur

Group by Shipment,[Event Type];

LOAD date(Date#([Date Occured],'MMMM DD, YYYY'),'MMMM DD, YYYY')AS DateOccur,

date(Date#([Date Submitted],'MMMM DD, YYYY'),'MMMM DD, YYYY')AS DateSubmit,* INLINE [

    Shipment, Event Type, Date Occured, Date Submitted

    12345, Estimated delivery, "July 1, 2013", "June 25, 2013"

    12345, Estimated delivery, "June 30, 2013", "June 24, 2013"

    abcde, Actual delivery, "June 14, 2013", "June 14, 2013"

    abcde, Estimated delivery, "June 14, 2013", "June 10, 2013"

];

Teke below field in table box hope it helps

Output like this

ShipmentEvent TypeMaxDateOccurMaxdateSubmit
12345Estimated deliveryJuly 01, 2013June 25, 2013
abcdeActual deliveryJune 14, 2013June 14, 2013
abcdeEstimated deliveryJune 14, 2013June 10, 2013
Not applicable
Author

see attachement

try this code

data:

LOAD Shipment,

    [Event Type],

        date(date#([Date Occured],'MMM DD, YYYY'),'MMM DD, YYYY') AS [Date Occured],

            date(date#([Date Submitted],'MMM DD, YYYY'),'MMM DD, YYYY') AS [Date Submitted];

LOAD * INLINE [

    Shipment, Event Type, Date Occured, Date Submitted

    12345, Estimated delivery, "July 1, 2013", "June 25, 2013"

    12345, Estimated delivery, "June 30, 2013", "June 24, 2013"

    abcde, Actual delivery, "June 14, 2013", "June 14, 2013"

    abcde, Estimated delivery, "June 14, 2013", "June 10, 2013"

];

OUTPUT:

LOAD

Shipment,

[Event Type],

DATE(MAX([Date Occured]),'MMM DD, YYYY') AS [MAX_Date Occured],

DATE(MAX([Date Submitted]),'MMM DD, YYYY') AS [MAX_Date Submitted]

Resident data

Group by Shipment,[Event Type];

then output like this

ShipmentEvent Type[Date Occured]Date Submitted
12345Estimated deliveryJul 01, 2013Jun 25, 2013
abcdeActual deliveryJun 14, 2013Jun 14, 2013
abcdeEstimated deliveryJun 14, 2013Jun 10, 2013