Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have a data table such as the below.
| Identifier | Date | Class |
| Alpha | 05/31/2017 | A |
| Alpha | 05/31/2017 | B |
| Alpha | 05/31/2017 | C |
| Alpha | 05/31/2017 | D |
| Alpha | 07/31/2017 | A |
| Alpha | 07/31/2017 | B |
Over time Classes will be removed from Identifier & Date combinations. In the above on 5/31/2017 Alpha had A,B,C,D for Classes while on 7/31/2017 it only had A,B. I'd like to create a table in my data model such as the below.
| Identifier | Date | Dropped Classes |
| Alpha | 07/31/2017 | C,D |
Does anyone have any ideas?
Thanks,
Mark
Try something like this
Table:
LOAD *,
Hash128(Identifier&'|'&Date&'|'&Class) as Key;
LOAD * INLINE [
Identifier, Date, Class
Alpha, 05/31/2017, A
Alpha, 05/31/2017, B
Alpha, 05/31/2017, C
Alpha, 05/31/2017, D
Alpha, 07/31/2017, A
Alpha, 07/31/2017, B
];
TempTable:
LOAD Distinct Identifier,
Class
Resident Table;
Left Join (TempTable)
LOAD DISTINCT Date
Resident Table;
DroppedClasses:
LOAD Identifier,
Date,
Concat(Class, ', ') as [Dropped Classes]
Resident TempTable
Where not Exists(Key, Hash128(Identifier&'|'&Date&'|'&Class))
Group By Identifier, Date;
DROP Table TempTable;
Hi Sunny,
This works very well but when I try and expand it to a larger data set it does not quite work as I want. For example if I add new date row
| Identifier | Date | Class |
| Alpha | 05/31/2017 | A |
| Alpha | 05/31/2017 | B |
| Alpha | 05/31/2017 | C |
| Alpha | 05/31/2017 | D |
| Alpha | 07/31/2017 | A |
| Alpha | 07/31/2017 | B |
| Alpha | 08/31/2017 | A |
Using the script you provided the output of this would be
| Identifier | Date | Dropped Classes |
| Alpha | 07/31/2017 | C,D |
| Alpha | 08/31/2017 | B,C,D |
Where what I am looking for is.
| Identifier | Date | Dropped Classes |
| Alpha | 07/31/2017 | C,D |
| Alpha | 08/31/2017 | B |
The current script looks to the whole set of classes for each date where I'd like to look to the most recent set.
Thanks,
Mark
Try it out with some modifications
Table:
LOAD *,
Hash128(Identifier&'|'&Date&'|'&Class) as Key;
LOAD * INLINE [
Identifier, Date, Class
Alpha, 05/31/2017, A
Alpha, 05/31/2017, B
Alpha, 05/31/2017, C
Alpha, 05/31/2017, D
Alpha, 07/31/2017, A
Alpha, 07/31/2017, B
Alpha, 08/31/2017, A
];
TempTable:
LOAD Distinct Identifier,
Class
Resident Table;
Left Join (TempTable)
LOAD DISTINCT Date
Resident Table;
TempDroppedClasses:
LOAD *
Where NOT Exists(New_Class);
LOAD Identifier,
Date,
Class as New_Class
Resident TempTable
Where not Exists(Key, Hash128(Identifier&'|'&Date&'|'&Class))
Order By Identifier, Class, Date;
DroppedClasses:
LOAD Identifier,
Date,
Concat(New_Class, ', ') as [Dropped Classes]
Resident TempDroppedClasses
Group By Identifier, Date;
DROP Table TempTable, TempDroppedClasses;