Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
All,
I have 2 colums, an employee ID and a Team Id.
Unfortunatly we have some teams in the team id field that don't belong to that person, so I only want to take the team that comes up the most often. So for example:
Employee Id TeamId
Paul 1
Paul 2
Paul 2
Paul 2
Paul 2
In this example it will take TeamId 1 for the mapping load. But I want to find the Id that comes up the most (2 in this case) and us that for the mapping load.
How can I achieve this?
Regards, Paul
My bad, I never really got to test the script I was proposing. This is what I wanted to suggest
Table:
Load * Inline [
EmployeeId,TeamId
Paul,1
Paul,2
Paul,2
Paul,2
Paul,2
];
Right Join (Table)
LOAD EmployeeId,
FirstSortedValue(TeamId, -Count) as TeamId
Group By EmployeeId;
LOAD EmployeeId,
TeamId,
Count(TeamId) as Count
Resident Table
Group By EmployeeId, TeamId;
MappingTable:
Mapping
LOAD DISTINCT EmployeeId,
TeamId
Resident Table;
DROP Table Table;
May be remove this information before you create the mapping table...
Table:
LOAD [Employee Id],
TeamId
FROM Source;
Left Join (Table)
LOAD [Employee Id],
FirstSortedValue(TeamId, -Count)
Group By [Employee Id];
LOAD [Employee Id],
TeamId
Count(TeamId) as Count
Resident Table
Group By [Employee Id], TeamId;
MappingTable:
Mapping
LOAD [Employee Id],
TeamId
Resident Table;
DROP Table Table;
Hi
Thanks for the quick reply. Not working completly though
What is the -count in:
LOAD [Employee Id],
FirstSortedValue(TeamId, -Count)
Picking the TeamId which shows up most often
My bad, try this:
Table:
LOAD [Employee Id],
TeamId
FROM Source;
Left Join (Table)
LOAD [Employee Id],
FirstSortedValue(TeamId, -Count) as TeamId
Group By [Employee Id];
LOAD [Employee Id],
TeamId
Count(TeamId) as Count
Resident Table
Group By [Employee Id], TeamId;
MappingTable:
Mapping
LOAD [Employee Id],
TeamId
Resident Table;
DROP Table Table;
Still not working, your sure this should do the trick? I still get an alphabetic sort
Hi Sunny,
I looked at this and then saw your answer, however could not quite get it to work but came up with this:
Tmp:
Load *,Count(TeamId) as Cnt Inline [EmployeeId,TeamId
Paul,1
Paul,2
Paul,2
Paul,2
Paul,2
] group by EmployeeId,TeamId;
Map:
Mapping Load EmployeeId,FirstSortedValue(TeamId,-Cnt) As TeamId
resident Tmp Group By EmployeeId;
Data:
LOAD EmployeeId,ApplyMap('Map',EmployeeId) As TeamId Resident Tmp;
Drop Table Tmp;
Example attached
G'Day Paul,
I will suggest something different, you should use the Aggregated function Mode() to resolve your problem:
I added more employees to your sample in order to illustrate how to work with the Mode() function as there is something tricky with its behavior; I kept Paul and added some random WTA Tennis players names; Serena associated with one team, Angie appearing in two teams; Garbine appearing once in three teams and finally Simona also appearing in three teams, but twice in teams 1 and 2.
All these employees were loaded into a tmp table with the standard LOAD * INLINE statement; next we need a Temporary Table, which I named T1:
NOTE: I added an extra record for Simona where she belong to the TeamId 4.
The table T1 shows the employees belonging to teams having the maximum number of occurrences as highlighted on the column filter; Aga's team 2; Angie's teams 1 and 2; Garbine's teams 1, 2 and 3; Paul's team 2; Serena's team 1 and Simona teams 1 and 2.
This is the script building the table t1:
t1:
LOAD t1_EmployeeId,
t1_MyTeam,
t1_TeamId_Cnt,
t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;
Load
EmployeeId as t1_EmployeeId,
TeamId as t1_MyTeam,
Count(TeamId) as t1_TeamId_Cnt
Resident Tmp
Group by EmployeeId, TeamId;
Now we need a second temporary table, which I named t2, this table identifies the records where each employee has the maximum number of occurrences in a team, which is the column filter on the table T1.
This is the script creating table T2:
t2:
Load
t2_EmployeeId,
t2_TeamId_Cnt,
t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;
LOAD
t1_EmployeeId as t2_EmployeeId,
Max(t1_TeamId_Cnt) as t2_TeamId_Cnt
Resident t1
group by t1_EmployeeId;
Now, if we join the tables 1 and 2 on the filter and filter3 columns we get the records where the employee is assigned to TeamId most often; we do that by creating table T3.
NOTE: I am explaining this solution step by step, that is why many temporary tables where introduced, it is up to you to understand the logic and remove excessive temporary tables.
The table T3 looks like this:
In this table, the Teams Id for each employee are the teams where they were assigned most often; Aga, Paul and Serena appear only once because they were assigned to their teams more often, Aga twice to team 2, Paul four times to team 2, while Serena was assigned to team 1 only one time. Angie, Garbine and Simona are appearing multiple times because they were assigned to teams the same number of time; notice that Simona's teams 3 and 4 do not appear because she was assigned twice to teams 1 and 2.
All these temporary tables are required to address the scenario illustrated by Simona, where she was assigned to multiple teams, with single assignments to teams ID greater than the teams where she was assigned more often.
The script building table 3 is shown next (NOTE: I did not reference the table T1, I just re-coded its script, this will allow me to select the records on the column filter3)
t3:
LOAD
t1_EmployeeId as t3_EmployeeId,
t1_MyTeam as t3_MyTeam
Where Exists(filter3);
LOAD t1_EmployeeId,
t1_MyTeam,
t1_TeamId_Cnt,
t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3;
Load
EmployeeId as t1_EmployeeId,
TeamId as t1_MyTeam,
Count(TeamId) as t1_TeamId_Cnt
Resident Tmp
Group by EmployeeId, TeamId;
The final step is the one that resolve your problem, we created the table Result
The script creating this table is shown below:
Result:
Load
t3_EmployeeId as EmployeeId,
if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam
Resident t3
Group by t3_EmployeeId;
Here the focus is with the column MyTeam:
if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam, 2), Mode(t3_MyTeam)) as MyTeam
The Mode(t3_MyTeam) function returns NULL (value -1) when an employee belong to more than one Team the same number of times, which is the case for Garbine, Angie and Simona in those cases, we picked up the Maximum Team Id.
I will paste the whole script next:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tmp:
Load * Inline [EmployeeId,TeamId
Paul,1
Paul,2
Paul,2
Paul,2
Paul,2
Serena, 1
Angie, 1
Angie, 2
Garbine, 1
Garbine, 2
Garbine, 3
Aga, 1
Aga, 2
Aga, 2
Aga, 3
Simona, 1
Simona, 1
Simona, 2
Simona, 2
Simona, 3
Simona, 4
];
t1:
LOAD t1_EmployeeId,
t1_MyTeam,
t1_TeamId_Cnt,
t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;
Load
EmployeeId as t1_EmployeeId,
TeamId as t1_MyTeam,
Count(TeamId) as t1_TeamId_Cnt
Resident Tmp
Group by EmployeeId, TeamId;
NoConcatenate
t2:
Load
t2_EmployeeId,
t2_TeamId_Cnt,
t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;
LOAD
t1_EmployeeId as t2_EmployeeId,
Max(t1_TeamId_Cnt) as t2_TeamId_Cnt
Resident t1
group by t1_EmployeeId;
NoConcatenate
t3:
LOAD
t1_EmployeeId as t3_EmployeeId,
t1_MyTeam as t3_MyTeam
Where Exists(filter3);
LOAD t1_EmployeeId,
t1_MyTeam,
t1_TeamId_Cnt,
t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3;
Load
EmployeeId as t1_EmployeeId,
TeamId as t1_MyTeam,
Count(TeamId) as t1_TeamId_Cnt
Resident Tmp
Group by EmployeeId, TeamId;
Result:
Load
t3_EmployeeId as EmployeeId,
if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam
Resident t3
Group by t3_EmployeeId;
Base_Data:
Load
EmployeeId as Data_EmployeeId,
TeamId as Data_TeamId,
RecNo() as RecNo
Resident Tmp;
Drop table Tmp;
DROP Table t1;
DROP Table t2;
DROP Table t3;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hope this helps
My bad, I never really got to test the script I was proposing. This is what I wanted to suggest
Table:
Load * Inline [
EmployeeId,TeamId
Paul,1
Paul,2
Paul,2
Paul,2
Paul,2
];
Right Join (Table)
LOAD EmployeeId,
FirstSortedValue(TeamId, -Count) as TeamId
Group By EmployeeId;
LOAD EmployeeId,
TeamId,
Count(TeamId) as Count
Resident Table
Group By EmployeeId, TeamId;
MappingTable:
Mapping
LOAD DISTINCT EmployeeId,
TeamId
Resident Table;
DROP Table Table;