Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello experts,
I have been trying to resolve an issue for over a week now and I think I'm just too close to see the issue. I have a table (below) that has HC_EID, HC_EMAIL, HC_BAND, HC_LEVEL_5 and DATA_LOAD_DATE in it. Each record is for a current employee and we are tracking weekly when they are in the data set and when they are not. So we have IN_PREV, meaning they were in the previous date's load. We also have IN_NEXT, meaning they were in the following date's data load in relation to the DATA_LOAD_DATE on that record. My syntax works as in it runs, but everyone ends up with only 0s in the IN_NEXT and no errors in the IN_PREV when I know there should be.
IN_PREV Logic:
For a given record for a given data_load_date
- if the record has a matching HC_EID, HC_EMAIL, HC_BAND, and HC_LEVEL_5 as the previous data_load_date then it should be a 1 (true)
- if the record has only a matching HC_EID as the previous data_load_date then it should be a 2 (err)
- if the record does not match any of the HC_EID, HC_EMAIL, HC_BAND, and HC_LEVEL_5 as the previous data_load_date then it should be a 0 (false)
IN_NEXT Logic:
For a given record for a given data_load_date
- if the record has a matching HC_EID, HC_EMAIL, HC_BAND, and HC_LEVEL_5 as the next data_load_date then it should be a 1 (true)
- if the record has only a matching HC_EID as the next data_load_date then it should be a 2 (err)
- if the record does not match any of the HC_EID, HC_EMAIL, HC_BAND, and HC_LEVEL_5 as the next data_load_date then it should be a 0 (false)
HC_EID HC_EMAIL HC_BAND HC_LEVEL_5 DATA_LOAD_DATE
123456 you@you.com 5 John Smith 3/3/2025
123456 me@me.com 7 Jane Doe 3/10/2025
123456 you@you.com 5 John Smith 3/17/2025
123456 you@you.com 5 John Smith 3/24/2025
123456 results should be:
for 03/03/2025 - IN_NEXT: 1 and IN_PREV: 0
for 03/10/2025 - IN_NEXT: 2 and IN_PREV: 2
for 03/17/2025 - IN_NEXT: 1 and IN_PREV: 2
for 03/24/2025 - IN_NEXT: 0 and IN_PREV: 1
// Load the raw_data QVD
Main_Data:
LOAD
*
FROM [$(vMainPath)QVD_files/TPD_Headcount_Main.qvd] (qvd);
// Create a distinct list of DATA_LOAD_DATE values and sort them
Dates:
LOAD DISTINCT
DATA_LOAD_DATE
RESIDENT Main_Data
ORDER BY DATA_LOAD_DATE ASC;
// Add PreviousDate and NextDate using Peek()
DatesWithPrevNext:
LOAD
DATA_LOAD_DATE,
Peek('DATA_LOAD_DATE', -1) AS PreviousDate,
Peek('DATA_LOAD_DATE', 1) AS NextDate
RESIDENT Dates;
// Create RawDataWithDates table
RawDataWithDates:
NOCONCATENATE
LOAD
*
RESIDENT Main_Data;
// Perform LEFT JOIN to add PreviousDate and NextDate to RawDataWithDates
LEFT JOIN (RawDataWithDates)
LOAD
DATA_LOAD_DATE,
PreviousDate,
NextDate
RESIDENT DatesWithPrevNext;
// Load the previous week's data for comparison
PreviousWeekData:
LOAD
HC_EID AS PW_ID,
HC_EMAIL AS PW_EMAIL,
HC_BAND AS PW_BAND,
HC_LEVEL_5 AS PW_LEVEL_5,
DATA_LOAD_DATE AS PreviousDate
RESIDENT RawDataWithDates
WHERE NOT ISNULL(PreviousDate);
// Load the next week's data for comparison
NextWeekData:
LOAD
HC_EID AS NW_ID,
HC_EMAIL AS NW_EMAIL,
HC_BAND AS NW_BAND,
HC_LEVEL_5 AS NW_LEVEL_5,
DATA_LOAD_DATE AS NextDate
RESIDENT RawDataWithDates
WHERE NOT ISNULL(NextDate);
// Join the previous week's data with the current data to calculate IN_PREV
LEFT JOIN (RawDataWithDates)
LOAD
PW_ID,
PW_EMAIL,
PW_BAND,
PW_LEVEL_5,
PreviousDate,
IF(
'PreviousWeekData.PW_ID' = 'RawDataWithDates.HC_EID' AND
'PreviousWeekData.PW_EMAIL' = 'RawDataWithDates.HC_EMAIL' AND
'PreviousWeekData.PW_BAND' = 'RawDataWithDates.HC_BAND' AND
'PreviousWeekData.PW_LEVEL_5' = 'RawDataWithDates.HC_LEVEL_5',
1, 0
) AS FoundInPrevFull,
IF(
'PreviousWeekData.PW_ID' = 'RawDataWithDates.HC_EID',
1, 0
) AS FoundInPrevID
RESIDENT PreviousWeekData;
// Join the next week's data with the current data to calculate IN_NEXT
LEFT JOIN (RawDataWithDates)
LOAD
NW_ID,
NW_EMAIL,
NW_BAND,
NW_LEVEL_5,
NextDate,
IF(
'NextWeekData.NW_ID' = 'RawDataWithDates.HC_EID' AND
'NextWeekData.NW_EMAIL' = 'RawDataWithDates.HC_EMAIL' AND
'NextWeekData.NW_BAND' = 'RawDataWithDates.HC_BAND' AND
'NextWeekData.NW_LEVEL_5' = 'RawDataWithDates.HC_LEVEL_5',
1, 0
) AS FoundInNextFull,
IF(
'NextWeekData.NW_ID' = 'RawDataWithDates.HC_EID',
1, 0
) AS FoundInNextID
RESIDENT NextWeekData;
// Calculate the IN_PREV and IN_NEXT columns
FinalData:
LOAD
*,
IF(FoundInPrevFull = 0, 1,
IF(FoundInPrevID = 0, 2, 0)) AS IN_PREV,
IF(FoundInNextFull = 0, 1,
IF(FoundInNextID = 0, 2, 0)) AS IN_NEXT
RESIDENT RawDataWithDates;
// Drop intermediate tables
DROP TABLE Main_Data;
DROP TABLE Dates;
DROP TABLE DatesWithPrevNext;
DROP TABLE RawDataWithDates;
DROP TABLE PreviousWeekData;
DROP TABLE NextWeekData;
What if you just read the datset twice with different sort order and do conditional peeks . See my suggestion for your sample dataset below.
ps- I tried several times to add a proper table and got the 'bad html was removed' issue along with not being able to format the code; not sure why but I have no 'select html option in the header' per the instructions.
What if you just read the datset twice with different sort order and do conditional peeks . See my suggestion for your sample dataset below.
Thank you sir, but the result of this is all the IN_NEXT and IN_PREV come out as zeros.
I have been using GitHub Copilot AI assistant with this project to get the bulk of the script, it would suggest a solution, then I'd try it, and if it didn't work I'd go back. With @Vegar script it was able to modify it slightly to get the correct answer, so I'll Accept as Solution but also wanted to share the working script. The main difference was the order by clause :
// Step 1: Load the main data and create a unique identifier
Main_Data:
LOAD
*,
AutoNumberHash128(HC_EID, HC_EMAIL, HC_BAND, HC_LEVEL_5) AS HC_REC_UNIQUE_IDENTIFIER
FROM [$(vMainPath)QVD_files/TPD_Headcount_Main.qvd] (qvd);
// Step 2: Calculate IN_PREV
PREV_ADDED:
LOAD *,
IF(
Peek('HC_EID') = HC_EID,
IF(Peek('HC_REC_UNIQUE_IDENTIFIER') = HC_REC_UNIQUE_IDENTIFIER, 1, 2),
0
) AS IN_PREV
RESIDENT Main_Data
ORDER BY HC_EID, DATA_LOAD_DATE ASC; // Sort by HC_EID and ascending DATA_LOAD_DATE
DROP TABLE Main_Data; // Drop the original table
// Step 3: Calculate IN_NEXT
NEXT_ADDED:
LOAD *,
IF(
Peek('HC_EID') = HC_EID,
IF(Peek('HC_REC_UNIQUE_IDENTIFIER') = HC_REC_UNIQUE_IDENTIFIER, 1, 2),
0
) AS IN_NEXT
RESIDENT PREV_ADDED
ORDER BY HC_EID, DATA_LOAD_DATE DESC; // Sort by HC_EID and descending DATA_LOAD_DATE
DROP TABLE PREV_ADDED; // Drop the intermediate table
// Step 4: Cleanup
DROP FIELD HC_REC_UNIQUE_IDENTIFIER;