Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a QVD with 2M rows. This QVD gets loaded by concatenating 280 excel files. This is employee data from our HR system. The same employee_id could be in this QVD 280 times. So I want to load distinct employee_id so I don't have duplicates. But I want to load the field values from the last row for this employee.
Every day there will be a new excel file to load. So I can't hard code anything.
I tried to add max() to the fields thinking I would get the last one. But got an invalid expression error when loading.
Here is the script without any logic to get the last row for each employee. What can I add to this logic to achieve what I want to do.
Load Distinct Employee_ID,
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActive
FROM [lib://DEV/QVDWDay\WD_EmployeeMasterAll.QVD] (qvd);
So again I would like to have one row for each employee with the latest values for each of the fields. There are more fields than these. But if I can get what I want to work for these I can do the rest of them.
Hi
I assume you want the latest Hire_Date_Most_Recent?
If so one way is to use
>>> order by Hire_Date_Most_Recent desc (after initial load. It can only be done using residential load)
and >>> where not exists (Distinct Employee_ID)
of course you would need to rename Distinct Employee_ID (where not exists). Or restrict just to the second table
something like
TEMPLOAD:
Load
Employee_ID,
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActive
FROM [lib://DEV/QVDWDay\WD_EmployeeMasterAll.QVD] (qvd);
DATA:
LOAD
*
where not exists (EMPLOIDTEMP) ;
Load
Employee_ID,
Employee_ID AS EMPLOIDTEMP,
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActive
RESIDENT TEMPLOAD
order by Hire_Date_Most_Recent desc ;
DROP TABLE TEMPLOAD ;
DROP FIELD EMPLOIDTEMP ;
This should help you:
TABLE:
Load Distinct Employee_ID,
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActive
FROM [lib://DEV/QVDWDay\WD_EmployeeMasterAll.QVD] (qvd);
INNER JOIN (TABLE) LOAD
Employee_ID,
max(Hire_Date_Most_Recent) as Hire_Date_Most_Recent
RESIDENT TABLE
GROUP BY Employee_ID;
Probably the Hire_Date_Most_Recent is not the field you want to use, I just use the only date field I see here. Hope you'll figure out what field you need.
Yes,
Instead of concatenating the new file to the old one, First load the new file then concatenate the old one with Where exists Clause - Where Not exists (employee).
Eg:
Load Distinct Employee_ID,
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActive
FROM
NewFile;
Concatenate
Full_Legal_Name,
Hire_Date_Most_Recent,
CurrentActiv
FROM
OldFile
Where
Not Exists(Full_Legal_Name);
or
If you want to keep the employee latest data from the final QVD then there should be a date field to sort out and keep the latest records