Solved: Find most used field for mapping load - Qlik Community

pauldamen · ‎2016-09-05

All,

I have 2 colums, an employee ID and a Team Id.

Unfortunatly we have some teams in the team id field that don't belong to that person, so I only want to take the team that comes up the most often. So for example:

Employee Id TeamId

Paul 1

Paul 2

In this example it will take TeamId 1 for the mapping load. But I want to find the Id that comes up the most (2 in this case) and us that for the mapping load.

How can I achieve this?

Regards, Paul

sunny_talwar · ‎2016-09-06

My bad, I never really got to test the script I was proposing. This is what I wanted to suggest

Table:

Load * Inline [

EmployeeId,TeamId

Paul,1

Paul,2

];

Right Join (Table)

LOAD EmployeeId,

FirstSortedValue(TeamId, -Count) as TeamId

Group By EmployeeId;

LOAD EmployeeId,

TeamId,

Count(TeamId) as Count

Resident Table

Group By EmployeeId, TeamId;

MappingTable:

Mapping

LOAD DISTINCT EmployeeId,

TeamId

Resident Table;

DROP Table Table;

View solution in original post

sunny_talwar · ‎2016-09-05

May be remove this information before you create the mapping table...

Table:

LOAD [Employee Id],

TeamId

FROM Source;

Left Join (Table)

LOAD [Employee Id],

FirstSortedValue(TeamId, -Count)

Group By [Employee Id];

LOAD [Employee Id],

TeamId

Count(TeamId) as Count

Resident Table

Group By [Employee Id], TeamId;

MappingTable:

Mapping

LOAD [Employee Id],

TeamId

Resident Table;

DROP Table Table;

pauldamen · ‎2016-09-05

Hi

Thanks for the quick reply. Not working completly though

What is the -count in:

LOAD [Employee Id],

FirstSortedValue(TeamId, -Count)

sunny_talwar · ‎2016-09-05

Picking the TeamId which shows up most often

sunny_talwar · ‎2016-09-05

My bad, try this:

Table:

LOAD [Employee Id],

TeamId

FROM Source;

Left Join (Table)

LOAD [Employee Id],

FirstSortedValue(TeamId, -Count) as TeamId

Group By [Employee Id];

LOAD [Employee Id],

TeamId

Count(TeamId) as Count

Resident Table

Group By [Employee Id], TeamId;

MappingTable:

Mapping

LOAD [Employee Id],

TeamId

Resident Table;

DROP Table Table;

pauldamen · ‎2016-09-05

Still not working, your sure this should do the trick? I still get an alphabetic sort

stigchel · ‎2016-09-05

Hi Sunny,

I looked at this and then saw your answer, however could not quite get it to work but came up with this:

Tmp:

Load *,Count(TeamId) as Cnt Inline [EmployeeId,TeamId

Paul,1

Paul,2

] group by EmployeeId,TeamId;

Map:

Mapping Load EmployeeId,FirstSortedValue(TeamId,-Cnt) As TeamId

resident Tmp Group By EmployeeId;

Data:

LOAD EmployeeId,ApplyMap('Map',EmployeeId) As TeamId Resident Tmp;

Drop Table Tmp;

stigchel · ‎2016-09-05

Example attached

ArnadoSandoval · ‎2016-09-06

G'Day Paul,

I will suggest something different, you should use the Aggregated function Mode() to resolve your problem:

I added more employees to your sample in order to illustrate how to work with the Mode() function as there is something tricky with its behavior; I kept Paul and added some random WTA Tennis players names; Serena associated with one team, Angie appearing in two teams; Garbine appearing once in three teams and finally Simona also appearing in three teams, but twice in teams 1 and 2.

All these employees were loaded into a tmp table with the standard LOAD * INLINE statement; next we need a Temporary Table, which I named T1:

NOTE: I added an extra record for Simona where she belong to the TeamId 4.

The table T1 shows the employees belonging to teams having the maximum number of occurrences as highlighted on the column filter; Aga's team 2; Angie's teams 1 and 2; Garbine's teams 1, 2 and 3; Paul's team 2; Serena's team 1 and Simona teams 1 and 2.

This is the script building the table t1:

t1:

LOAD t1_EmployeeId,

t1_MyTeam,

t1_TeamId_Cnt,

t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;

Load

EmployeeId as t1_EmployeeId,

TeamId as t1_MyTeam,

Count(TeamId) as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

Now we need a second temporary table, which I named t2, this table identifies the records where each employee has the maximum number of occurrences in a team, which is the column filter on the table T1.

This is the script creating table T2:

t2:

Load

t2_EmployeeId,

t2_TeamId_Cnt,

t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;

LOAD

t1_EmployeeId as t2_EmployeeId,

Max(t1_TeamId_Cnt) as t2_TeamId_Cnt

Resident t1

group by t1_EmployeeId;

Now, if we join the tables 1 and 2 on the filter and filter3 columns we get the records where the employee is assigned to TeamId most often; we do that by creating table T3.

NOTE: I am explaining this solution step by step, that is why many temporary tables where introduced, it is up to you to understand the logic and remove excessive temporary tables.

The table T3 looks like this:

In this table, the Teams Id for each employee are the teams where they were assigned most often; Aga, Paul and Serena appear only once because they were assigned to their teams more often, Aga twice to team 2, Paul four times to team 2, while Serena was assigned to team 1 only one time. Angie, Garbine and Simona are appearing multiple times because they were assigned to teams the same number of time; notice that Simona's teams 3 and 4 do not appear because she was assigned twice to teams 1 and 2.

All these temporary tables are required to address the scenario illustrated by Simona, where she was assigned to multiple teams, with single assignments to teams ID greater than the teams where she was assigned more often.

The script building table 3 is shown next (NOTE: I did not reference the table T1, I just re-coded its script, this will allow me to select the records on the column filter3)

t3:

LOAD

t1_EmployeeId as t3_EmployeeId,

t1_MyTeam as t3_MyTeam

Where Exists(filter3);

LOAD t1_EmployeeId,

t1_MyTeam,

t1_TeamId_Cnt,

t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3;

Load

EmployeeId as t1_EmployeeId,

TeamId as t1_MyTeam,

Count(TeamId) as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

The final step is the one that resolve your problem, we created the table Result

The script creating this table is shown below:

Result:

Load

t3_EmployeeId as EmployeeId,

if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam

Resident t3

Group by t3_EmployeeId;

Here the focus is with the column MyTeam:

if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam, 2), Mode(t3_MyTeam)) as MyTeam

The Mode(t3_MyTeam) function returns NULL (value -1) when an employee belong to more than one Team the same number of times, which is the case for Garbine, Angie and Simona in those cases, we picked up the Maximum Team Id.

I will paste the whole script next:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Tmp:

Load * Inline [EmployeeId,TeamId

Paul,1

Paul,2

Serena, 1

Angie, 1

Angie, 2

Garbine, 1

Garbine, 2

Garbine, 3

Aga, 1

Aga, 2

Aga, 3

Simona, 1

Simona, 2

Simona, 3

Simona, 4

];

t1:

LOAD t1_EmployeeId,

t1_MyTeam,

t1_TeamId_Cnt,

t1_EmployeeId & '|' & t1_TeamId_Cnt as filter;

Load

EmployeeId as t1_EmployeeId,

TeamId as t1_MyTeam,

Count(TeamId) as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

NoConcatenate

t2:

Load

t2_EmployeeId,

t2_TeamId_Cnt,

t2_EmployeeId & '|' & t2_TeamId_Cnt as filter3;

LOAD

t1_EmployeeId as t2_EmployeeId,

Max(t1_TeamId_Cnt) as t2_TeamId_Cnt

Resident t1

group by t1_EmployeeId;

NoConcatenate

t3:

LOAD

t1_EmployeeId as t3_EmployeeId,

t1_MyTeam as t3_MyTeam

Where Exists(filter3);

LOAD t1_EmployeeId,

t1_MyTeam,

t1_TeamId_Cnt,

t1_EmployeeId & '|' & t1_TeamId_Cnt as filter3;

Load

EmployeeId as t1_EmployeeId,

TeamId as t1_MyTeam,

Count(TeamId) as t1_TeamId_Cnt

Resident Tmp

Group by EmployeeId, TeamId;

Result:

Load

t3_EmployeeId as EmployeeId,

if(IsNull(Mode(t3_MyTeam)) = -1, Max(t3_MyTeam), Mode(t3_MyTeam)) as MyTeam

Resident t3

Group by t3_EmployeeId;

Base_Data:

Load

EmployeeId as Data_EmployeeId,

TeamId as Data_TeamId,

RecNo() as RecNo

Resident Tmp;

Drop table Tmp;

DROP Table t1;

DROP Table t2;

DROP Table t3;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope this helps

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

sunny_talwar · ‎2016-09-06

My bad, I never really got to test the script I was proposing. This is what I wanted to suggest

Table:

Load * Inline [

EmployeeId,TeamId

Paul,1

Paul,2

];

Right Join (Table)

LOAD EmployeeId,

FirstSortedValue(TeamId, -Count) as TeamId

Group By EmployeeId;

LOAD EmployeeId,

TeamId,

Count(TeamId) as Count

Resident Table

Group By EmployeeId, TeamId;

MappingTable:

Mapping

LOAD DISTINCT EmployeeId,

TeamId

Resident Table;

DROP Table Table;