Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have employee data where records are added for actions like department changes, supervisor changes, name changes, department name changes, etc. Ultimately I want to identify what Supervisor an employee reported to for a given time period; however, I have multiple rows for a single supervisor that I need to consolidate.
I know I can do a group by clause to get min/max dates but:
Attached is sample data and the desired output to illustrate the problem. I'm looking to solve for this within the Data Load Editor.
Hello fietzn,
You have to create checks for changes between rows.
ID field does the work and create a counter when there is a change in the controls.
Afterwards, you can do grouping.
Tmp:
LOAD *,
if(RecNo() = 1, 1, if(Previous(EMPLOYEE_ID) = EMPLOYEE_ID and Previous(SUPERVISOR_ID) = SUPERVISOR_ID
and (Previous(EFFECTIVE_STOP) + 1 = EFFECTIVE_START or Previous(EFFECTIVE_STOP) = EFFECTIVE_START), Peek(ID), Peek(ID) + 1)) as ID
INLINE [
EMPLOYEE_ID SUPERVISOR_ID EFFECTIVE_START EFFECTIVE_STOP
1234 3707 8/18/2016 9/25/2017
1234 3707 9/26/2017 7/9/2018
1234 5020 7/10/2018 12/25/2018
1234 5020 12/26/2018 4/15/2019
1234 5020 4/16/2019 5/8/2019
1234 5020 5/9/2019 1/30/2020
1234 3840 1/31/2020 2/17/2020
1234 3840 2/18/2020 5/3/2020
1234 5020 5/4/2020 9/1/2020
1234 5020 9/2/2020 9/29/2020
1234 5020 9/30/2020 10/3/2020
1234 5020 10/4/2020 1/5/2021
1234 5020 1/6/2021 3/24/2021
1234 6179 3/25/2021 4/6/2021
1234 6179 4/7/2021 4/19/2021
1234 6179 4/20/2021 5/8/2021
1234 6179 5/9/2021 9/22/2021
1234 6179 9/23/2021 9/26/2021
1234 6179 9/26/2021 1/25/2022
1234 6179 1/26/2022 3/1/2022
1234 6179 3/2/2022 4/30/2022
1234 22732 5/1/2022 5/30/2022
1234 12707 5/31/2022 8/6/2022
1234 12782 8/7/2022 11/3/2022
1234 12782 11/4/2022 12/31/9000
] (delimiter is '\t');
Result:
NoConcatenate
LOAD EMPLOYEE_ID, SUPERVISOR_ID, ID,
Min(EFFECTIVE_START) as EFFECTIVE_START, Max(EFFECTIVE_STOP) as EFFECTIVE_STOP
Resident Tmp
GROUP BY EMPLOYEE_ID, SUPERVISOR_ID, ID;
DROP TABLE Tmp;
DROP FIELD ID FROM Result;
I hope it resolves your problem.
Hello fietzn,
You have to create checks for changes between rows.
ID field does the work and create a counter when there is a change in the controls.
Afterwards, you can do grouping.
Tmp:
LOAD *,
if(RecNo() = 1, 1, if(Previous(EMPLOYEE_ID) = EMPLOYEE_ID and Previous(SUPERVISOR_ID) = SUPERVISOR_ID
and (Previous(EFFECTIVE_STOP) + 1 = EFFECTIVE_START or Previous(EFFECTIVE_STOP) = EFFECTIVE_START), Peek(ID), Peek(ID) + 1)) as ID
INLINE [
EMPLOYEE_ID SUPERVISOR_ID EFFECTIVE_START EFFECTIVE_STOP
1234 3707 8/18/2016 9/25/2017
1234 3707 9/26/2017 7/9/2018
1234 5020 7/10/2018 12/25/2018
1234 5020 12/26/2018 4/15/2019
1234 5020 4/16/2019 5/8/2019
1234 5020 5/9/2019 1/30/2020
1234 3840 1/31/2020 2/17/2020
1234 3840 2/18/2020 5/3/2020
1234 5020 5/4/2020 9/1/2020
1234 5020 9/2/2020 9/29/2020
1234 5020 9/30/2020 10/3/2020
1234 5020 10/4/2020 1/5/2021
1234 5020 1/6/2021 3/24/2021
1234 6179 3/25/2021 4/6/2021
1234 6179 4/7/2021 4/19/2021
1234 6179 4/20/2021 5/8/2021
1234 6179 5/9/2021 9/22/2021
1234 6179 9/23/2021 9/26/2021
1234 6179 9/26/2021 1/25/2022
1234 6179 1/26/2022 3/1/2022
1234 6179 3/2/2022 4/30/2022
1234 22732 5/1/2022 5/30/2022
1234 12707 5/31/2022 8/6/2022
1234 12782 8/7/2022 11/3/2022
1234 12782 11/4/2022 12/31/9000
] (delimiter is '\t');
Result:
NoConcatenate
LOAD EMPLOYEE_ID, SUPERVISOR_ID, ID,
Min(EFFECTIVE_START) as EFFECTIVE_START, Max(EFFECTIVE_STOP) as EFFECTIVE_STOP
Resident Tmp
GROUP BY EMPLOYEE_ID, SUPERVISOR_ID, ID;
DROP TABLE Tmp;
DROP FIELD ID FROM Result;
I hope it resolves your problem.
Thank you @SerhanKaraer !