Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all, I need help with how to write a script.
I am sure this is really simple, but I am trying to select the only one row of data with multiple columns based on the value in one of the columns.
For example, I have the following data
Shipment | Event Type | Date Occured | Date Submitted |
---|---|---|---|
12345 | Estimated delivery | July 1, 2013 | June 25, 2013 |
12345 | Estimated delivery | June 30, 2013 | June 24, 2013 |
abcde | Actual delivery | June 14, 2013 | June 14, 2013 |
abcde | Estimated delivery | June 14, 2013 | June 10, 2013 |
but I would like to get only the most recent date occured per shipment, per event type. see below.
Shipment | Event Type | Date Occured | Most recent Date Submitted |
---|---|---|---|
12345 | Estimated Delivery | July 1, 2013 | June, 25, 2013 |
abcde | Actual Delivery | June 14, 2013 | June 14, 2013 |
abcde | Estimated Delivery | June 14, 2013 | June 10, 2013 |
I thought I could do something like
New Table:
noconcatenate
load distinct
[Shipment],
[event type],
[Date occured],
max([date Submitted]) as [most recent date submitted]
resident old table
group by [shipment],[event type], [date occured];
however since the date occured is different, I get two rows... I cant guarantee that the date something occured is the larger or smaller. In other words the date submitted has nothing to do with when it was submitted or previous dates, it can go up or down. I just need to only capture the most recent one.
Any help would be appreciated.
Thanks
Hi,
Try this code,
Test_Table:
Load * inline [
Shipment, Event_Type, Date_Occured, Date_Submitted
12345, Estimated Delivery, 01/07/2013, 25/06/2013
12345, Estimated Delivery, 30/06/2013, 24/06/2013
abcde, Actual Delivery, 14/06/2013, 14/06/2013
abcde, Estimated Delivery, 14/06/2013, 10/06/2013
];
inner join
load Shipment,
max(Date_Occured) as Date_Occured
resident Test_Table
group by Shipment;
Or you can find attached file.
Regards
Neetu Singh
You can load the data finding only the max(date submitted) and then left join to find the date the event ocurred
New_Table:
noconcatenate
load distinct
[Shipment],
[event type]
max([date Submitted]) as [most recent date submitted]
resident old table
group by [shipment],[event type];
Left Join (New_Table)
LOAD
[Shipment],
[event type],
[Date occured],
[date Submitted] as [most recent date submitted]
Resident old_table
However, you are going to link using the shipment, event type and date submitted, what can cause dupplicated results. So I suggest you use a Primary Key on your Old table ( f.i. RowNo()), so you can do a better join!
I hope it helps!
Luiz
Thanks for the help Neetu, but unfortunately the source file I have is a few million lines long, thus the inline load is not an option.
Hi Luiz,
I think I see what you mean. If I have a unique identifier in the original table (Row_Number) then I can then use it for my link when I left join the second table after I filter to just the max event on the first one.
One problem, I still do not know how to sort on the first table I create and get only the maximum. I will have two unique row numbers, thus I will get two maximum values when I "group by row_number"... I will only want one one row_number that has the most recent submission.
You are right, it's not going to work with the unique identifier.
What happens if you have the same date submitted for one shipment and event type?
F.i.
Shipment | Event Type | Date Occured | Date Submitted |
---|---|---|---|
12345 | Estimated delivery | July 1, 2013 | June 25, 2013 |
12345 | Estimated delivery | June 30, 2013 | June 25, 2013 |
Should you retrieve both lines if that occurs?
If you use Shipment, event type and date submitted to find the date occured, on your example (that doesn't have two equal date submitted for the same shipment and event type) you should get results like this
New_Table:
noconcatenate
load distinct
[Shipment],
[event type]
max([date Submitted]) as [most recent date submitted]
resident old table
group by [shipment],[event type];
Shipment | Event Type | Most recent Date Submitted |
---|---|---|
12345 | Estimated delivery | June 25, 2013 |
abcde | Actual delivery | June 14, 2013 |
abcde | Estimated delivery | June 10, 2013 |
Left Join (New_Table)
LOAD
[Shipment],
[event type],
[Date occured],
[date Submitted] as [most recent date submitted]
Resident old_table
Shipment | Event Type | Date Occured | Most recent Date Submitted |
---|---|---|---|
12345 | Estimated Delivery | July 1, 2013 | June, 25, 2013 |
abcde | Actual Delivery | June 14, 2013 | June 14, 2013 |
abcde | Estimated Delivery | June 14, 2013 | June 10, 2013 |
I think Neetu's solution is correct. She was using the inline table only to demonstrate. The load could be from a table or file instead.
-Rob
Hi
I have used INline only for demo as i don't have the table but you can use table directly from database or Qvds also in place of Inline table as mention below:
Test_Table:
Load Shipment,
Event_Type,
Date_Occured,
Date_Submitted
From "TableName"
inner join
load Shipment,
max(Date_Occured) as Date_Occured
resident Test_Table
group by Shipment;
Regards
Neetu Singh
Try this code
Data:
LOAD Shipment,[Event Type],
max(Date(DateSubmit,'MMMM DD, YYYY')) as MaxdateSubmit,
max(Date(DateOccur,'MMMM DD, YYYY')) as MaxDateOccur
Group by Shipment,[Event Type];
LOAD date(Date#([Date Occured],'MMMM DD, YYYY'),'MMMM DD, YYYY')AS DateOccur,
date(Date#([Date Submitted],'MMMM DD, YYYY'),'MMMM DD, YYYY')AS DateSubmit,* INLINE [
Shipment, Event Type, Date Occured, Date Submitted
12345, Estimated delivery, "July 1, 2013", "June 25, 2013"
12345, Estimated delivery, "June 30, 2013", "June 24, 2013"
abcde, Actual delivery, "June 14, 2013", "June 14, 2013"
abcde, Estimated delivery, "June 14, 2013", "June 10, 2013"
];
Teke below field in table box hope it helps
Output like this
Shipment | Event Type | MaxDateOccur | MaxdateSubmit |
12345 | Estimated delivery | July 01, 2013 | June 25, 2013 |
abcde | Actual delivery | June 14, 2013 | June 14, 2013 |
abcde | Estimated delivery | June 14, 2013 | June 10, 2013 |
see attachement
try this code
data:
LOAD Shipment,
[Event Type],
date(date#([Date Occured],'MMM DD, YYYY'),'MMM DD, YYYY') AS [Date Occured],
date(date#([Date Submitted],'MMM DD, YYYY'),'MMM DD, YYYY') AS [Date Submitted];
LOAD * INLINE [
Shipment, Event Type, Date Occured, Date Submitted
12345, Estimated delivery, "July 1, 2013", "June 25, 2013"
12345, Estimated delivery, "June 30, 2013", "June 24, 2013"
abcde, Actual delivery, "June 14, 2013", "June 14, 2013"
abcde, Estimated delivery, "June 14, 2013", "June 10, 2013"
];
OUTPUT:
LOAD
Shipment,
[Event Type],
DATE(MAX([Date Occured]),'MMM DD, YYYY') AS [MAX_Date Occured],
DATE(MAX([Date Submitted]),'MMM DD, YYYY') AS [MAX_Date Submitted]
Resident data
Group by Shipment,[Event Type];
then output like this
Shipment | Event Type | [Date Occured] | Date Submitted |
12345 | Estimated delivery | Jul 01, 2013 | Jun 25, 2013 |
abcde | Actual delivery | Jun 14, 2013 | Jun 14, 2013 |
abcde | Estimated delivery | Jun 14, 2013 | Jun 10, 2013 |