Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
pauldamen
Partner - Creator II
Partner - Creator II

Find most used field for mapping load

All,

I have 2 colums, an employee ID and a Team Id.

Unfortunatly we have some teams in the team id field that don't belong to that person, so I only want to take the team that comes up the most often. So for example:

Employee Id          TeamId

Paul                         1

Paul                         2

Paul                         2

Paul                         2

Paul                         2

In this example it will take TeamId 1 for the mapping load. But I want to find the Id that comes up the most (2 in this case) and us that for the mapping load.

How can I achieve this?

Regards, Paul

1 Solution

Accepted Solutions
sunny_talwar

My bad, I never really got to test the script I was proposing. This is what I wanted to suggest

Table:

Load * Inline [

EmployeeId,TeamId

Paul,1

Paul,2

Paul,2

Paul,2

Paul,2

];

Right Join (Table)

LOAD EmployeeId,

    FirstSortedValue(TeamId, -Count) as TeamId

Group By EmployeeId;

LOAD EmployeeId,

    TeamId,

    Count(TeamId) as Count

Resident Table

Group By EmployeeId, TeamId;

MappingTable:

Mapping

LOAD DISTINCT EmployeeId,

    TeamId

Resident Table;

DROP Table Table;

View solution in original post

9 Replies
sunny_talwar

May be remove this information before you create the mapping table...

Table:

LOAD [Employee Id],

           TeamId

FROM Source;

Left Join (Table)

LOAD [Employee Id],

           FirstSortedValue(TeamId, -Count)

Group By [Employee Id];

LOAD [Employee Id],

           TeamId

           Count(TeamId) as Count

Resident Table

Group By [Employee Id], TeamId;

MappingTable:

Mapping

LOAD [Employee Id],

           TeamId

Resident Table;

DROP Table Table;

pauldamen
Partner - Creator II
Partner - Creator II
Author

Hi

Thanks for the quick reply. Not working completly though

What is the -count in:

LOAD [Employee Id],

           FirstSortedValue(TeamId, -Count)

sunny_talwar

Picking the TeamId which shows up most often

sunny_talwar

My bad, try this:

Table:

LOAD [Employee Id],

          TeamId

FROM Source;

Left Join (Table)

LOAD [Employee Id],

          FirstSortedValue(TeamId, -Count) as TeamId

Group By [Employee Id];

LOAD [Employee Id],

          TeamId

          Count(TeamId) as Count

Resident Table

Group By [Employee Id], TeamId;

MappingTable:

Mapping

LOAD [Employee Id],

          TeamId

Resident Table;

DROP Table Table;

pauldamen
Partner - Creator II
Partner - Creator II
Author

Still not working, your sure this should do the trick? I still get an alphabetic sort

stigchel
Partner - Master
Partner - Master

Hi Sunny,

I looked at this and then saw your answer, however could not quite get it to work but came up with this:

Tmp:

Load *,Count(TeamId) as Cnt Inline [EmployeeId,TeamId

Paul,1

Paul,2

Paul,2

Paul,2

Paul,2

] group by EmployeeId,TeamId;

Map:

Mapping Load EmployeeId,FirstSortedValue(TeamId,-Cnt) As TeamId 

resident Tmp Group By EmployeeId;

Data:

LOAD EmployeeId,ApplyMap('Map',EmployeeId) As TeamId Resident Tmp;

Drop Table Tmp;

stigchel
Partner - Master
Partner - Master

Example attached

ArnadoSandoval
Specialist II
Specialist II

G'Day Paul,

I will suggest something different, you should use the Aggregated function Mode() to resolve your problem:

Mode.png

I added more employees to your sample in order to illustrate how to work with the Mode() function as there is something tricky with its behavior; I kept Paul and added some random WTA Tennis players names; Serena associated with one team, Angie appearing in two teams; Garbine appearing once in three teams and finally Simona also appearing in three teams, but twice in teams 1 and 2.

Employees_Teams.png

All these employees were loaded into a tmp table with the standard LOAD * INLINE statement; next we need a Temporary Table, which I named T1:

T1.png

NOTE: I added an extra record for Simona where she belong to the TeamId 4.


The table T1 shows the employees belonging to teams having the maximum number of occurrences as highlighted on the column filter; Aga's team 2; Angie's teams 1 and 2; Garbine's teams 1, 2 and 3; Paul's team 2; Serena's team 1 and Simona teams 1 and 2.

This is the script building the table t1:

t1:

LOAD t1_EmployeeId,

     t1_MyTeam,

     t1_TeamId_Cnt,

     t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;

    

Load

     EmployeeId       as t1_EmployeeId,

     TeamId           as t1_MyTeam,

     Count(TeamId)    as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;


Now we need a second temporary table, which I named t2, this table identifies the records where each employee has the maximum number of occurrences in a team, which is the column filter on the table T1.

T2.png

This is the script creating table T2:

t2:

Load

     t2_EmployeeId,

     t2_TeamId_Cnt,

     t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;

LOAD

     t1_EmployeeId      as t2_EmployeeId,

     Max(t1_TeamId_Cnt) as t2_TeamId_Cnt

Resident t1

group by t1_EmployeeId;


Now, if we join the tables 1 and 2 on the filter and filter3 columns we get the records where the employee is assigned to TeamId most often; we do that by creating table T3.


NOTE: I am explaining this solution step by step, that is why many temporary tables where introduced, it is up to you to understand the logic and remove excessive temporary tables.


The table T3 looks like this:

T3.png

In this table, the Teams Id for each employee are the teams where they were assigned most often; Aga, Paul and Serena appear only once because they were assigned to their teams more often, Aga twice to team 2, Paul four times to team 2, while Serena was assigned to team 1 only one time. Angie, Garbine and Simona are appearing multiple times because they were assigned to teams the same number of time; notice that Simona's teams 3 and 4 do not appear because she was assigned twice to teams 1 and 2.


All these temporary tables are required to address the scenario illustrated by Simona, where she was assigned to multiple teams, with single assignments to teams ID greater than the teams where she was assigned more often.


The script building table 3 is shown next (NOTE: I did not reference the table T1, I just re-coded its script, this will allow me to select the records on the column filter3)


t3:

LOAD

     t1_EmployeeId  as t3_EmployeeId,

     t1_MyTeam      as t3_MyTeam

Where Exists(filter3);

    

LOAD t1_EmployeeId,

     t1_MyTeam,

     t1_TeamId_Cnt,

     t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3; 

Load

     EmployeeId       as t1_EmployeeId,

     TeamId           as t1_MyTeam,

     Count(TeamId)    as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;


The final step is the one that resolve your problem, we created the table Result

Result.png

The script creating this table is shown below:

Result:    

Load

     t3_EmployeeId as EmployeeId,

     if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam

Resident t3

Group by t3_EmployeeId;

Here the focus is with the column MyTeam:

if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam, 2), Mode(t3_MyTeam)) as MyTeam

The Mode(t3_MyTeam) function returns NULL (value -1) when an employee belong to more than one Team the same number of times, which is the case for Garbine, Angie and Simona in those cases, we picked up the Maximum Team Id.

I will paste the whole script next:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Tmp:

Load * Inline [EmployeeId,TeamId

Paul,1

Paul,2

Paul,2

Paul,2

Paul,2

Serena, 1

Angie, 1

Angie, 2

Garbine, 1

Garbine, 2

Garbine, 3

Aga, 1

Aga, 2

Aga, 2

Aga, 3

Simona, 1

Simona, 1

Simona, 2

Simona, 2

Simona, 3

Simona, 4

];

t1:

LOAD t1_EmployeeId,

     t1_MyTeam,

     t1_TeamId_Cnt,

     t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;

    

Load

     EmployeeId       as t1_EmployeeId,

     TeamId           as t1_MyTeam,

     Count(TeamId)    as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

NoConcatenate

t2:

Load

     t2_EmployeeId,

     t2_TeamId_Cnt,

     t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;

LOAD

     t1_EmployeeId      as t2_EmployeeId,

     Max(t1_TeamId_Cnt) as t2_TeamId_Cnt

Resident t1

group by t1_EmployeeId;

NoConcatenate    

t3:

LOAD

     t1_EmployeeId  as t3_EmployeeId,

     t1_MyTeam      as t3_MyTeam

Where Exists(filter3);

    

LOAD t1_EmployeeId,

     t1_MyTeam,

     t1_TeamId_Cnt,

     t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3; 

Load

     EmployeeId       as t1_EmployeeId,

     TeamId           as t1_MyTeam,

     Count(TeamId)    as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

  

Result:    

Load

     t3_EmployeeId as EmployeeId,

     if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam

Resident t3

Group by t3_EmployeeId;

Base_Data:

Load

      EmployeeId       as Data_EmployeeId,

      TeamId           as Data_TeamId,

      RecNo()          as RecNo

Resident Tmp;

     

Drop table Tmp;

DROP Table t1;

DROP Table t2;

DROP Table t3;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope this helps

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.
sunny_talwar

My bad, I never really got to test the script I was proposing. This is what I wanted to suggest

Table:

Load * Inline [

EmployeeId,TeamId

Paul,1

Paul,2

Paul,2

Paul,2

Paul,2

];

Right Join (Table)

LOAD EmployeeId,

    FirstSortedValue(TeamId, -Count) as TeamId

Group By EmployeeId;

LOAD EmployeeId,

    TeamId,

    Count(TeamId) as Count

Resident Table

Group By EmployeeId, TeamId;

MappingTable:

Mapping

LOAD DISTINCT EmployeeId,

    TeamId

Resident Table;

DROP Table Table;