Hope this can help you
SET MoneyFormat='€ #.##0,00;-€ #.##0,00';
SET TimestampFormat='DD/MM/YYYY hh:mm:ss[.fff]';
//load original data and count null for every rows
LOAD *, if(len(trim(Floor))=0,1,0) + if(len(trim(Room))=0,1,0) as FlagNullCount;
LOAD * INLINE [
EmployeeID, Department, Floor, Room, DateOfEntry
111, Accounting, , ,1/1/12
111, Accounting, , 201 ,2/2/12
111, Accounting, , ,3/3/12
222, HR, Three, 100, 1/1/12
222, Accounting, Four, 100, 2/2/12
222, HR, One, 100, 3/3/12
222, IS, Two, 500, 4/4/12
// flag record in the target dataset
if(isnull(EmployeeID) or Peek(EmployeeID)<>EmployeeID,1,0) as FlagInTargetDataset,
Order by EmployeeID, FlagNullCount, DateOfEntry desc;
DROP Table Tmp;
// make final table
NoConcatenate load * Resident Tmp2 Where FlagInTargetDataset=1;
DROP Table Tmp2;
102858.qvw 151.8 K
Probably it is possible to use an Aggregation operator and/or a Transformation operator with some special functions to write all this logic with LUA in a simpler dataflow. But I prefer to use as less LUA as possible to ease the maintenance and use different combinations of operators.
With your data I have created a dataflow to implement this logic and create the desired result:
Although it can look complex, it is not. There is three group of operators to perform the same process with the three fields used (Department, Floor and Room) and two joins at the end to mixed all fields.
My process is to calculate the number of iterations for every field value, then get the max iteration for every one, and join the max iteration with the iteration numbers to get only the records with max iterations. At the end, I order by date and get only the last value (last date), in case there are more than one record with the same number of iterations.
Attached you have the Expressor project and the source data I have used.
Hope this helps.
Wow! Tons of great information from all. Thanks!
Juan, your solution is a bit closer to what I was after. Thank you very much! I think I may have also simplified your solution. Have a look:
The first Aggregate Operators are exactly as you created them. But it appears that your 2nd Aggregate and 1st Join are not necessary when sorting by NumRoom and MaxDate.
(FYI - The Transforms in the screenshot aren't doing anything important)
Let me know if you think I'm missing something. But it's worked on every combination of data I threw at it. So we'll see how it does on a much larger dataset once I get a snapshot of the production data.