Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
mulan1
Contributor II
Contributor II

Filter out unwanted rows

hello

I have Employees tbl. For some emp I have this problem: 

EmpID Department Manager StartDate EndDate
1 Management 70362 06/10/2019 01/01/2024
1 Management 4550 06/10/2019 01/01/2024
1 sales 70362 01/01/2018 05/10/2019

 

I assume that the first row is a mistake - in which the employee changed departments but the manager remained the same manager. It is kind of a duplicate row with wrong Manager. I don't want to load rows like these -  the EmpID is the same, the startDate is different, the department is different but the manager remains the same as the manager of the previous department. 

I tried with Previous() and with Exist(), but I didn't get the desired result.
Thanks in advance to the helpers.

Labels (2)
2 Solutions

Accepted Solutions
rubenmarin1

Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.

Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.

LOAD
  EmpID,
  StartDate,
  Manager,
  Department,
  If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;

 

View solution in original post

Kushal_Chawda

@mulan1  You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work

 

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019

2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];

New:
Load *,
     if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
     Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;

Drop Table Data;

Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;

Drop Table New;

drop Field Manager_Exists;

 

 

Screenshot 2024-10-19 203700.png

View solution in original post

5 Replies
Qrishna
Master
Master

How can you say only this row

1 Management 70362 06/10/2019 01/01/2024

is a duplicated row on  what basis/criteria? the other row can also be duplicated right? if you have thousand of records, is there any criteria applicable to figure out the duplicated rows? lets say, like you mentioned, all the rows above the row where EmpID = prev(EmpID) and  Department= prev(Department) and Manager<> prev(Manager) and StartDate= prev(StartDate) and EndDate= prev(EndDate) should be filtered out? if thats the case you could simply do it where clause : 

 

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019
];


Qualify *;
U:
Load * Resident Data
where not (EmpID = previous(EmpID) and Department= previous(Department) and Manager<> previous(Manager) and StartDate= previous(StartDate) and EndDate= previous(EndDate));

2488039 - Filter out duplicated unwanted previous rows.PNG

There are several ways to achieve this, if you wanna do with exist() we need a field to compare, or create some flag fields and filter out on UI, on UI you could do the same with above(), but i prefer doing most of the data transformation in load editor as we always have millions of records

 

 

 

rubenmarin1

Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.

Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.

LOAD
  EmpID,
  StartDate,
  Manager,
  Department,
  If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;

 

Kushal_Chawda

@mulan1  You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work

 

Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019

2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];

New:
Load *,
     if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
     Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;

Drop Table Data;

Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;

Drop Table New;

drop Field Manager_Exists;

 

 

Screenshot 2024-10-19 203700.png

mulan1
Contributor II
Contributor II
Author

I checked all your suggestions, and in the end a mix between the suggestions worked.
Thank you very much everyone!

Kushal_Chawda

@mulan1  Please mark the close suggestion as correct answer or share working script and mark it as correct answer to help others to know the correct solutions