Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi everyone,
My HR data model is made mainly with slowly changing dimensions, which I have combined into one 'Sub Intervals' table.
I have six tables of data HR dependent on the time that I had to control using the interval match and subintervlas.
Basically my end result is being able to choose a point in time in calendar, and see the state of my employees based on the different slowly changing dimensions.
Data model:
I have two cases:
1. When I round up date(ceil(SubIntervalEnd)) in script I have problem with SubIntervals table, this table create one additional interval (31.12.2016 - 01.01.2017) which is wrong and if I didn't remove this row from table I have wrong result, when I choose last date in calendar.
2. When I don't round up date(ceil(SubIntervalEnd)), IntervalMatch in BridgeTableKalendar I don't see last day with connection to interval(01.01.2017).
Do you know what is the problem? Where I made a mistake and how can I solve it?
SubIntervalBegin - min date is 01.01.2014
SubIntervalEnd - max date is 01.01.2017
Script:
Let vEpsilon = Pow(2,-27);
//// ============ Load all distinct beginnings and endings Intervals ============
TempSubIntervals:
//Tabelee PA0000
Load distinct PA0000.PERSONALNUMMER_TMP as PERSONALNUMMER, PA0000.BEGDA0001 as SubIntervalBegin Resident PA0000;
Load distinct PA0000.PERSONALNUMMER_TMP as PERSONALNUMMER, PA0000.ENDDA0001 as SubIntervalBegin Resident PA0000;
//Tabelle PA0001
Load distinct PA0001.PERSONALNUMMER_TMP as PERSONALNUMMER, PA0001.BEGDA0001 as SubIntervalBegin Resident PA0001;
Load distinct PA0001.PERSONALNUMMER_TMP as PERSONALNUMMER, PA0001.ENDDA0001 as SubIntervalBegin Resident PA0001;
... for rest table script looks the same
// ============ Create the SubIntervals ============
SubIntervals:
Load*,
PERSONALNUMMER & '|' & SubIntervalBegin as SubIntervalID,
SubIntervalBegin & '-' & SubIntervalEnd as SubInterval.key;
Load
SubIntervalBegin as SubIntervalBegin ,
date(ceil(SubIntervalEnd)) ` as SubIntervalEnd,
PERSONALNUMMER,
PERSONALNUMMER as %PERSONALNUMMER.Personendaten,
Load PERSONALNUMMER,
SubIntervalBegin,
Dual(SubIntervalEnd, (SubIntervalEnd - $(#vEpsilon))) as SubIntervalEnd
Where not IsNull(SubIntervalEnd) ;
Load PERSONALNUMMER, SubIntervalBegin,
If(PERSONALNUMMER = Previous(PERSONALNUMMER), Previous(SubIntervalBegin)) as SubIntervalEnd
Resident TempSubIntervals
Order By PERSONALNUMMER, SubIntervalBegin Desc;
Drop Table TempSubIntervals;
TmpBridgeTablePA0000:
IntervalMatch (SubIntervalBegin, PERSONALNUMMER)
Load distinct PA0000.BEGDA0001, PA0000.ENDDA0001, PA0000.PERSONALNUMMER_TMP as PERSONALNUMMER resident PA0000;
BridgeTable_PA0000:
Load
PERSONALNUMMER & '|' & SubIntervalBegin as SubIntervalID,
PERSONALNUMMER & '|' & PA0000.BEGDA0001 & '|' & PA0000.ENDDA0001 as PA0000.OriginalIntervalID
Resident TmpBridgeTablePA0000;
Drop table TmpBridgeTablePA0000;
TmpBridgeTablePA0001:
IntervalMatch (SubIntervalBegin, PERSONALNUMMER)
Load distinct PA0001.BEGDA0001, PA0001.ENDDA0001, PA0001.PERSONALNUMMER_TMP as PERSONALNUMMER resident PA0001;
BridgeTable_PA0001:
Load
PERSONALNUMMER & '|' & SubIntervalBegin as SubIntervalID,
PERSONALNUMMER & '|' & PA0001.BEGDA0001 & '|' & PA0001.ENDDA0001 as PA0001.OriginalIntervalID
Resident TmpBridgeTablePA0001;
Drop table TmpBridgeTablePA0001;
... for rest table script looks the same
TMP_Kalender:
LOAD
date(MinimumDatum + IterNo() - 1) as Datum,
date(MinimumDatum + IterNo() - 1) as %DATUM,
Month(date(MinimumDatum + IterNo() - 1)) as Monat,
Year( date(MinimumDatum + IterNo() - 1)) as Jahr,
Week( date(MinimumDatum + IterNo() - 1)) as Woche
WHILE MinimumDatum + IterNo() - 1 <= MaximumDatum;
LOAD
min(PA0000.BEGDA0001) as MinimumDatum,
max(PA0000.ENDDA0001) as MaximumDatum
RESIDENT PA0000;
NoConcatenate
Kalender:
LOAD
%DATUM,
// date(ceil(%DATUM)) as %DATUM,
Datum as Kalender.Datum,
Monat as Kalender.Monat,
Jahr as Kalender.Jahr,
MonthEnd(Datum) as Kalender.MonatJahr,
if(Datum = date(floor(MonthEnd(Datum))), floor(MonthEnd(Datum)),'') as Kalender.MonatJahrUltimo
Resident TMP_Kalender;
BridgeTableKalendar_TMP:
IntervalMatch (%DATUM)
Load Distinct SubIntervalBegin, SubIntervalEnd Resident SubIntervals;
NoConcatenate
BridgeTableKalendar:
Load
(SubIntervalBegin & '-' & SubIntervalEnd) as SubInterval.key,
SubIntervalBegin as TMPSubIntervalBegin,
SubIntervalEnd as TMPSubIntervalEnd,
%DATUM
Resident BridgeTableKalendar_TMP;
Drop table BridgeTableKalendar_TMP;
Thanks!
I think you are following pretty much the tech doc written by Henric
IntervalMatch and Slowly Changing Dimensions
but with slight modifications, which you should carefully review.
For example the original uses:
// ============ Create the bridge table between the two interval types ============
TmpBridgeTable:
IntervalMatch (SubIntervalEnd, SPID)
Load distinct OriginalBegin, OriginalEnd, TmpSPID as SPID resident OriginalIntervals;
BridgeTable:
Load
SPID & '|' & SubIntervalEnd as SubIntervalID,
SPID & '|' & OriginalBegin & '|' & OriginalEnd as OriginalIntervalID
Resident TmpBridgeTable;
Drop Field TmpSPID;
Drop table TmpBridgeTable;
where you are using the SubIntervalBegin fields.
I'm afraid I don't understand the issue. Do you have a date 01.01.2017 in your data? Is that supposed to be the last date in your data? And you want to be able to select that date? If so, then why shouldn't there be an interval 31.12.2016-01.01.2017?
Magdalena, I think you are following Henric Cronstroem's tech doc about slowly changing dimensions, subchapter interval partioning.
There is a reason he is using
Dual(SubToDate, SubToDate - $(#vEpsilon)) as SubToDate
It creates a SubToDate showing the previously calculated SubToDate (which is effectively the start of the next subinterval for the same key), but having a numeric value just a little smaller / before.
This is to prevent that the Intervalmatch is making incorrect matches.
Looking at your data, if you remove the ceil() [which would recall the dual numeric part calculation shown above],
your interval match seem to correctly creating links for dates between 1.4.2016 and 31.12.2016 (including), while your PERSONALNUMMER seems to show exactely these BEGDA00001 and ENDDA00001 values for that interval.
Hence, I see no issue with removing the Ceil() function and just keeping to HIC's sample.
Or may I have missed something else?
When I count(distinct PERSONALNUMMER) for 01.01.2017, it shows me wrong data, becouse he counts wrong row with interval 31.12.2016-01.01.2017.
When I remove the ceil(), I have a problem with IntervalMatch calendar, because it does not refer to the date 01.01.2017 in the calendar intervals, when ENDDA0001 is 01.01.2017, so the last day in Calendar(%DATUM) is 31.12.2016 for ENDDA0001 (01.01.2017).
It's hard to tell how to fix your issue without looking at your data.
I remember I had to adjust the interval creation in another thread
How to check whether certain record timestamps overlap...
Maybe check if the discussion there is giving you any hint, if not, try to create a small sample QVW that demonstrates your setting and issue, then upload this sample to this thread (you can use dummy data for this sample, but it should show your issue).
Hi,
Sorry that's so late, in attached sample QVW.
BR,
Magda
I am not 100% sure (well, it would be easier with a small, reloadable sample application), but maybe it's enough
to use ceil in your calendar bridge creation:
BridgeTableKalendar_TMP:
IntervalMatch (%DATUM)
Load Distinct SubIntervalBegin, Ceil(SubIntervalEnd) as SubIntervalEnd Resident SubIntervals;
I tried this but unfortunately this doesn't work properly.
Could you check if attached model is correct?