Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hello
I have Employees tbl. For some emp I have this problem:
EmpID | Department | Manager | StartDate | EndDate |
1 | Management | 70362 | 06/10/2019 | 01/01/2024 |
1 | Management | 4550 | 06/10/2019 | 01/01/2024 |
1 | sales | 70362 | 01/01/2018 | 05/10/2019 |
I assume that the first row is a mistake - in which the employee changed departments but the manager remained the same manager. It is kind of a duplicate row with wrong Manager. I don't want to load rows like these - the EmpID is the same, the startDate is different, the department is different but the manager remains the same as the manager of the previous department.
I tried with Previous() and with Exist(), but I didn't get the desired result.
Thanks in advance to the helpers.
Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.
Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.
LOAD
EmpID,
StartDate,
Manager,
Department,
If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;
@mulan1 You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work
Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019
2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];
New:
Load *,
if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;
Drop Table Data;
Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;
Drop Table New;
drop Field Manager_Exists;
How can you say only this row
1 | Management | 70362 | 06/10/2019 | 01/01/2024 |
is a duplicated row on what basis/criteria? the other row can also be duplicated right? if you have thousand of records, is there any criteria applicable to figure out the duplicated rows? lets say, like you mentioned, all the rows above the row where EmpID = prev(EmpID) and Department= prev(Department) and Manager<> prev(Manager) and StartDate= prev(StartDate) and EndDate= prev(EndDate) should be filtered out? if thats the case you could simply do it where clause :
Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019
];
Qualify *;
U:
Load * Resident Data
where not (EmpID = previous(EmpID) and Department= previous(Department) and Manager<> previous(Manager) and StartDate= previous(StartDate) and EndDate= previous(EndDate));
There are several ways to achieve this, if you wanna do with exist() we need a field to compare, or create some flag fields and filter out on UI, on UI you could do the same with above(), but i prefer doing most of the data transformation in load editor as we always have millions of records
Hi, if you do a sorted load by EmpId, StartDate and manager, you can flag the records where all the fields were equal as the previous row but the department changes.
Just be carefull in case there is the possibility that both, the emp and the manager really changed department, in that case maybe you need an extra check for records with 2 equal StarDates.
LOAD
EmpID,
StartDate,
Manager,
Department,
If(EmpID=Peek('EmpID') and Manager=Peek('Manager') and Department<>Peek('EmpID'), 1) as isDuplicated
Resident DataTable
Order By EmpID, StartDate, Manager;
@mulan1 You need to sort your data first for peek and previous function to work as expected. I don't think this scenario covers all cases. In rare cases it might be possible to have a same manager for current and previous department. But for this alone scenario below should work
Data:
Load * Inline [
EmpID, Department, Manager, StartDate, EndDate,
1, Management, 70362, 06/10/2019, 01/01/2024
1, Management, 4550, 06/10/2019, 01/01/2024
1, sales, 70362, 01/01/2018, 05/10/2019
2, Procurement, 40321, 06/10/2018, 01/01/2020
2, Sales, 5043, 06/01/2020, 01/01/2024
2, Sales, 40321, 06/01/2020, 01/01/2024
];
New:
Load *,
if(EmpID=Previous(EmpID),if( Department <>Previous(Department),Previous(Manager),
Peek('Manager_Exists'))) as Manager_Exists
Resident Data
Order by EmpID,StartDate,Manager;
Drop Table Data;
Final:
NoConcatenate
Load *
Resident New
where Manager<>Manager_Exists;
Drop Table New;
drop Field Manager_Exists;
I checked all your suggestions, and in the end a mix between the suggestions worked.
Thank you very much everyone!
@mulan1 Please mark the close suggestion as correct answer or share working script and mark it as correct answer to help others to know the correct solutions