Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to create a list of locations associated with each value in a column (eg stopping pattern for a bus route), but the data sometimes has each location twice recording the arrival and departure. I am currently getting the list correct, but it includes the location twice when it occurs twice in the data. I would only like to see the location once even if the data records the arrival and departure.
The order of the location is important as it describes the order of stops.
eg
temp:
load * inline
[
busRoute, sequence, activity, location, time
B1, 1, O, loc_A, 9:00
B1, 10, A, loc_B, 9:10
B1, 11, D, loc_B, 9:11
B1, 20, A, loc_C, 9:20
B1, 21, D, loc_C, 9:21
B1, 31, T, loc_D, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
// O - Origin
// A - Arrival
// D - Departure
// T - Terminate
left Join
load busRoute,
concat(DISTINCT location,'-',sequence) as stop_pattern
Resident temp
Group by busRoute;
produces stopping patterns for
B1: loc_A-loc_B-loc_B-loc_C-Loc_C-loc_D
B2: loc_P-loc_Q-loc_Q-loc_R-Loc_R-loc_S
What I'm looking for is:
B1: loc_A-loc_B-Loc_C-loc_D
B2: loc_P-loc_Q-loc_R-loc_S
Is there a way to achieve this?
Can you check this?
temp:
load * inline
[
busRoute, sequence, activity, location, time
B1, 1, O, loc_D, 9:00
B1, 10, A, loc_C, 9:10
B1, 11, D, loc_C, 9:11
B1, 20, A, loc_B, 9:20
B1, 21, D, loc_B, 9:21
B1, 31, T, loc_A, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
Left Join(temp)
LOAD busRoute, Concat(DISTINCT If(activity <> 'A',location),'-',sequence) As Path
Resident temp
Group By busRoute
;
With this simple example, this works, not sure if your real data is more complex..
temp:
load * inline
[
busRoute, sequence, activity, location, time
B1, 1, O, loc_A, 9:00
B1, 10, A, loc_B, 9:10
B1, 11, D, loc_B, 9:11
B1, 20, A, loc_C, 9:20
B1, 21, D, loc_C, 9:21
B1, 31, T, loc_D, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
// O - Origin
// A - Arrival
// D - Departure
// T - Terminate
Distinct:
Load distinct busRoute,
location as stop
Resident temp;
left Join(temp)
load busRoute,
concat(stop,'-') as stop_pattern
Resident Distinct
Group by busRoute;
Drop table Distinct;
Using Hierarchy.
tab1:
Hierarchy(location, ParentLoc,Node,,location,,'-')
LOAD *, location As Node, If(activity<>'O',Peek(location)) As ParentLoc
Where activity <> 'A';
LOAD * INLINE [
busRoute, sequence, activity, location, time
B1, 1, O, loc_A, 9:00
B1, 10, A, loc_B, 9:10
B1, 11, D, loc_B, 9:11
B1, 20, A, loc_C, 9:20
B1, 21, D, loc_C, 9:21
B1, 31, T, loc_D, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
tab2:
NoConcatenate
LOAD busRoute&':'&Path As Output
Resident tab1
Where activity = 'T';
Drop Table tab1;
Output.
This one doesn't quite work the way i was hoping. The example may have just been a bit too simple.
If we reverse the location labels of one bus route (B1), the sequence is in alphabetical order, and i need it in the order of sequence. I added sequence to the distinct load, but that just allows the duplicate label to be seen as a distinct label because it has a distinct sequence number.
temp:
load * inline
[
busRoute, sequence, activity, location, time
B1, 1, O, loc_D, 9:00
B1, 10, A, loc_C, 9:10
B1, 11, D, loc_C, 9:11
B1, 20, A, loc_B, 9:20
B1, 21, D, loc_B, 9:21
B1, 31, T, loc_A, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
In this example the desired output is:
B1: loc_D-loc_C-loc_B-loc_A
B2: loc_P-loc_Q-loc_R-loc_S
Can you check this?
temp:
load * inline
[
busRoute, sequence, activity, location, time
B1, 1, O, loc_D, 9:00
B1, 10, A, loc_C, 9:10
B1, 11, D, loc_C, 9:11
B1, 20, A, loc_B, 9:20
B1, 21, D, loc_B, 9:21
B1, 31, T, loc_A, 9:30
B2, 1, O, loc_P, 9:00
B2, 10, A, loc_Q, 10:10
B2, 11, D, loc_Q, 10:11
B2, 20, A, loc_R, 10:20
B2, 21, D, loc_R, 10:21
B2, 31, T, loc_S, 10:30
];
Left Join(temp)
LOAD busRoute, Concat(DISTINCT If(activity <> 'A',location),'-',sequence) As Path
Resident temp
Group By busRoute
;