Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
My source data is something like:
Name Rank
-------- ------
John 100
John 200
Mary 300
tab_1:
LOAD Name, Rank
FROM <Source>;
tab_2:
LOAD Name, MAX (Rank) as MaxRank
Resident tab_1
Group BY Name;
// Now I need to do a inner join on between tables tab_1 and tab_2 so I can filter out rows on tab_1 based on aggregated rows from tab_2. So, I should come up with the output:
Name Rank
-------- -------
John 200
Mary 300
Perhaps doing something like (pseudo-code):
inner join (tab_1)
load *
resident tab_1;
Here's my question:
How do I join these tables knowing that MaxRank from tab_2 and Rank from tab_1 have different names? Name is fine but the aggregated names for Rank are different.
Hi,
I would suggest you add JOIN to the definition of tab_2, so:
[tab_1]:
Load * Inline [
Name,Rank
John,100
John,200
Mary,300
];
Join LOAD Name, MAX (Rank) as MaxRank
Resident tab_1
Group BY Name;
This will create a single table that includes both Rank and MaxRank
You can then include an additional table where you filter the data to only show records where Rank=MaxRank and then drop the original table:
[tab_1]:
Load * Inline [
Name,Rank
John,100
John,200
Mary,300
];
Join LOAD Name, MAX (Rank) as MaxRank
Resident tab_1
Group BY Name;
tab_final:
Load Name,Rank
Resident [tab_1]
Where Rank=MaxRank;
Drop table tab_1;
This will leave you with the table you are looking for:
Just name the rank-fields identically and it should work, means:
...
... MAX (Rank) as Rank ...
...
- Marcus
Hi,
I would suggest you add JOIN to the definition of tab_2, so:
[tab_1]:
Load * Inline [
Name,Rank
John,100
John,200
Mary,300
];
Join LOAD Name, MAX (Rank) as MaxRank
Resident tab_1
Group BY Name;
This will create a single table that includes both Rank and MaxRank
You can then include an additional table where you filter the data to only show records where Rank=MaxRank and then drop the original table:
[tab_1]:
Load * Inline [
Name,Rank
John,100
John,200
Mary,300
];
Join LOAD Name, MAX (Rank) as MaxRank
Resident tab_1
Group BY Name;
tab_final:
Load Name,Rank
Resident [tab_1]
Where Rank=MaxRank;
Drop table tab_1;
This will leave you with the table you are looking for: