Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have a requirement which are consists of ID, Status, Create and Update Dates.
From the source am getting more no.of records on same ID's. In that records i need to filter the data based on Max (Update Date) followed with Status.If we get multiple rows for the same Update date then we need to take the Priority of the Status.
Priority Status:
Act |
Rejected XYZ |
Pending XYZ |
Rejected ABC |
Pending ABC |
Dft |
Example:
ID | Status | Create_Date | Update_Date |
1 | Act | 9/7/2016 4:40:56 AM | 10/6/2016 4:04:03 AM |
1 | pending xyz | 9/7/2016 4:40:56 AM | 10/6/2016 4:04:03 AM |
1 | Dft | 9/7/2016 4:40:56 AM | 10/6/2016 4:03:05 AM |
1 | Rejected XYZ | 9/7/2016 4:40:56 AM | 10/6/2016 4:05:03 AM |
In the above data, for ID "1" i need to show the record below.
Required Output:
1 | Rejected XYZ | 9/7/2016 4:40:56 AM | 10/6/2016 4:05:03 AM |
The conditions to filter the data is
- Consider the maximum Update Date Value
- Still we are getting more no. of records for the same ID then based on the Priority Status we need to show the appropriate status value for the ID then we will get only one record for the ID with latest update date and latest Status.
Attached a sample data and follows by the desired output. Please let me know for more clarification on this.
Thanks
This script will to the trick :
Mapping_Priority:
MAPPING LOAD
capitalize(Status) as Status,
Priority;
LOAD * INLINE [
Status, Priority
Act, 1
Rejected XYZ, 2
Pending XYZ, 3
Rejected ABC, 4
Pending ABC, 5
Dft, 6
];
Temp_Update:
LOAD
ID&'/'&max(Update_Date) as ID_MaxUpDate
FROM
(ooxml, embedded labels, table is Sheet1)
group by ID;
Temp:
LOAD
ID,
capitalize(Status) as Status,
Create_Date,
num(Update_Date) as UpDateNum,
Update_Date,
applymap('Mapping_Priority',capitalize(Status),null()) as Status_Priority
FROM
(ooxml, embedded labels, table is Sheet1)
where exists(ID_MaxUpDate,ID&'/'&Update_Date);
Drop table Temp_Update;
Temp_Update2:
LOAD
ID&'/'&UpDateNum&'/'&min(Status_Priority) as ID_MaxUpDate_MinStatus
resident Temp
group by ID, UpDateNum ;
FinalTable:
LOAD
ID,
Status,
Create_Date,
Update_Date
resident Temp
where exists(ID_MaxUpDate_MinStatus,ID&'/'&UpDateNum&'/'&Status_Priority);
Drop tables Temp,Temp_Update2;
Guys, Can any help me on this. I would appreciate if you guys can spend some time on this to Provide your views.
Data:
LOAD
ID,
Status,
Create_Date,
Update_Date,
Ord(Left(Status,1)) as TEMPID
FROM
Report.xlsx
(ooxml, embedded labels, table is Sheet1);
NoConcatenate
Final:
Load
ID,
FirstSortedValue(Distinct Status,-Update_Date+TEMPID/100000) as Status,
FirstSortedValue(Create_Date,-Update_Date+TEMPID/100000) as Create_Date,
FirstSortedValue(Update_Date,-Update_Date+TEMPID/100000) as Update_Date
Resident Data
Group By ID;
Drop Table Data;
Or based on your priority...
Data:
LOAD
ID,
Status,
Create_Date,
Update_Date,
Pick(Match(Upper(Status),'ACT','REJECTED XYZ','PENDING XYZ','REJECTED ABC','PENDING ABC','DFT'),1,2,3,4,5,6) as TEMPID
FROM
Report.xlsx
(ooxml, embedded labels, table is Sheet1);
NoConcatenate
Final:
Load
ID,
FirstSortedValue(Distinct Status,-Update_Date+TEMPID/100000) as Status,
FirstSortedValue(Create_Date,-Update_Date+TEMPID/100000) as Create_Date,
FirstSortedValue(Update_Date,-Update_Date+TEMPID/100000) as Update_Date
Resident Data
Group By ID;
Drop Table Data;
This script will to the trick :
Mapping_Priority:
MAPPING LOAD
capitalize(Status) as Status,
Priority;
LOAD * INLINE [
Status, Priority
Act, 1
Rejected XYZ, 2
Pending XYZ, 3
Rejected ABC, 4
Pending ABC, 5
Dft, 6
];
Temp_Update:
LOAD
ID&'/'&max(Update_Date) as ID_MaxUpDate
FROM
(ooxml, embedded labels, table is Sheet1)
group by ID;
Temp:
LOAD
ID,
capitalize(Status) as Status,
Create_Date,
num(Update_Date) as UpDateNum,
Update_Date,
applymap('Mapping_Priority',capitalize(Status),null()) as Status_Priority
FROM
(ooxml, embedded labels, table is Sheet1)
where exists(ID_MaxUpDate,ID&'/'&Update_Date);
Drop table Temp_Update;
Temp_Update2:
LOAD
ID&'/'&UpDateNum&'/'&min(Status_Priority) as ID_MaxUpDate_MinStatus
resident Temp
group by ID, UpDateNum ;
FinalTable:
LOAD
ID,
Status,
Create_Date,
Update_Date
resident Temp
where exists(ID_MaxUpDate_MinStatus,ID&'/'&UpDateNum&'/'&Status_Priority);
Drop tables Temp,Temp_Update2;
Hello Ashokkumar,
Please refer the below script which will generate desired result using JOIN:
Report:
LOAD ID,
Status,
Match(UPPER(Status),'ACT', 'REJECTED XYZ', 'PENDING XYZ', 'REJECTED ABC', 'PENDING ABC', 'DFT') AS Position,
Create_Date,
Update_Date
FROM
Report.xlsx
(ooxml, embedded labels, table is Sheet1);
//Find the Max. Update Date for indivial IDs
INNER JOIN
LOAD ID AS ID,
MAX(TIMESTAMP(Update_Date)) AS Update_Date
RESIDENT Report
GROUP BY ID;
//If we get multiple rows for the same Update Date then we need to take the Priority of the Status.
INNER JOIN
LOAD ID,
MIN(Position) AS Position
RESIDENT Report
GROUP BY ID;
Hope this will help.
Thank you!
Rahul !
Thanks Thomas
Thanks Rahul