Solved: Filter out unwanted rows - Qlik Community

mulan1 · ‎2024-10-19

hello

I have Employees tbl. For some emp I have this problem:

EmpID	Department	Manager	StartDate	EndDate
1	Management	70362	06/10/2019	01/01/2024
1	Management	4550	06/10/2019	01/01/2024
1	sales	70362	01/01/2018	05/10/2019

I assume that the first row is a mistake - in which the employee changed departments but the manager remained the same manager. It is kind of a duplicate row with wrong Manager. I don't want to load rows like these - the EmpID is the same, the startDate is different, the department is different but the manager remains the same as the manager of the previous department.

I tried with Previous() and with Exist(), but I didn't get the desired result.
Thanks in advance to the helpers.

rubenmarin · ‎2024-10-19

Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.

Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.

LOAD
  EmpID,
  StartDate,
  Manager,
  Department,
  If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;

View solution in original post

Kushal_Chawda · ‎2024-10-19

@mulan1 You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019

2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];

New:
Load *,
     if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
     Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;

Drop Table Data;

Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;

Drop Table New;

drop Field Manager_Exists;

View solution in original post

Qrishna · ‎2024-10-19

How can you say only this row

1

Management

70362

06/10/2019

01/01/2024

is a duplicated row on what basis/criteria? the other row can also be duplicated right? if you have thousand of records, is there any criteria applicable to figure out the duplicated rows? lets say, like you mentioned, all the rows above the row where EmpID = prev(EmpID) and Department= prev(Department) and Manager<> prev(Manager) and StartDate= prev(StartDate) and EndDate= prev(EndDate) should be filtered out? if thats the case you could simply do it where clause :

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019
];

Qualify *;
U:
Load * Resident Data
where not (EmpID = previous(EmpID) and Department= previous(Department) and Manager<> previous(Manager) and StartDate= previous(StartDate) and EndDate= previous(EndDate));

There are several ways to achieve this, if you wanna do with exist() we need a field to compare, or create some flag fields and filter out on UI, on UI you could do the same with above(), but i prefer doing most of the data transformation in load editor as we always have millions of records

rubenmarin · ‎2024-10-19

Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.

Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.

LOAD
  EmpID,
  StartDate,
  Manager,
  Department,
  If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;

Kushal_Chawda · ‎2024-10-19

@mulan1 You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019

2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];

New:
Load *,
     if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
     Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;

Drop Table Data;

Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;

Drop Table New;

drop Field Manager_Exists;

mulan1 · ‎2024-10-21

I checked all your suggestions, and in the end a mix between the suggestions worked.
Thank you very much everyone!

Kushal_Chawda · ‎2024-10-22

@mulan1 Please mark the close suggestion as correct answer or share working script and mark it as correct answer to help others to know the correct solutions

Filter out unwanted rows

Qlik Sense

Script