Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a relatively simple task, but I’m currently stuck at one point.
I want to create something similar to an Intervalmatch, but optimized for mapping, as a traditional Intervalmatch would take too long in this case.
I have a mapping table with timestamps, which usually differ only slightly from those in the target table, but in most cases, they almost match. Now, I’d like to define an additional column that maps the correct value from the mapping table to the target table, as long as the timestamp in the mapping table doesn't deviate by more than 10 seconds in the past.
Here’s a sample code to illustrate:
MAPPING_TABLE:
MAPPING LOAD
KEY & '-' & Date & '-' & Time,
Time
INLINE [
KEY, Date,Time,
1, 22.05.2024,13:10:41
1, 22.05.2024,13:11:40
1, 22.05.2024,13:11:37
];
TABLE2:
LOAD *,
APPLYMAP('MAPPING_TABLE', KEY & '-' & Date2, 'N/A') AS Time; // Need the correct code here;
LOAD *,
Time(Time2 - MakeTime(0, 0, 10)) AS TimeDelta
INLINE [
KEY, Date2, Time2
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
My goal is to create the following table:
GOAL_TABLE:
LOAD *
INLINE [
KEY, Date2, Time2, Time,
1, 22.05.2024, 13:11:45, N/A
1, 22.05.2024, 13:11:41, 13:11:40
1, 22.05.2024, 13:11:38, 13:11:37
];
Hopfefully you can support me here.
Best regards Son
@SonPhanHumanIT It seems complex transformation. It should be done in ETL ideally if data volumes are huge. If data volumes are not huge then you can try something below. It does cross join on Key,Date & Time which is heavy operation
Data:
LOAD *,
KEY & '|' & Date & '|' & Time as _key,
Time(Time - MakeTime(0, 0, 10)) AS TimeFrom
INLINE [
KEY, Date, Time
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
Join(Data)
//Mapping table
LOAD * INLINE [
KEY, Date,Time_To_Map
1, 22.05.2024,13:10:41
1, 22.05.2024,13:11:40
1, 22.05.2024,13:11:37
];
T1:
Load *,
if(floor(interval(Time_To_Map-TimeFrom,'hh:mm:ss'))>=0 and
second(interval(Time_To_Map-TimeFrom,'hh:mm:ss'))<=10,1,0) as Flag1
Resident Data;
Drop Table Data;
T2:
Load *,
if(KEY = Previous(KEY) and Date = Previous(Date) and Time_To_Map=Previous(Time_To_Map)
and Previous(Flag1)=1,0,Flag1) as Flag
Resident T1
Where Flag1=1
Order by KEY,Date,Time_To_Map,Time;
Drop Table T1;
Final:
NoConcatenate
Load *
Resident T2
where Flag=1;
Drop Table T2;
// Append not existed keys from original table to list not mapped records.
Concatenate(Final)
LOAD *
where not Exists(_key);
LOAD *,KEY & '|' & Date & '|' & Time as _key
INLINE [
KEY, Date, Time
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
Drop Fields _key,Flag,Flag1,TimeFrom;
@SonPhanHumanIT It's just your assumption or you already tried Intervalmatch approach to find that it's taking too long?
personally, i wouldnt take this route but yeah if you require to take this route probably you could try something like below:
TABLE1:
LOAD Date1,
Time1,
SUBFIELD(Time1,':',3) AS Seconds1,
RowNo() AS %KEY;
LOAD * INLINE [
KEY, Date1,Time1,
1, 22.05.2024,13:10:41
1, 22.05.2024,13:11:40
1, 22.05.2024,13:11:37
];
JOIN(TABLE1)
TABLE2:
LOAD KEY,
Date2,
Time2,
SUBFIELD(Time2,':',3) AS Seconds2,
RowNo() AS %KEY;
LOAD * INLINE [
KEY, Date2, Time2
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
set vtimedelta = 2;
NOCONCATENATE
TABLE:
LOAD *,
if((Seconds2 < Seconds1 + $(vtimedelta)) or (Seconds2 < Seconds1 - $(vtimedelta)), Time1, 'NA') as Time_Filled
RESIDENT TABLE1;
DROP TABLES TABLE1;
DROP FIELDS Seconds1, Seconds2;
Unfortunately, it's not possible to use an interval match when the key is not distinct in this case.
In your case, you would need to sort the Time2
values, but this could lead to an error if you try to match a range, such as from 13:12:01 to 13:11:59. However, thanks for your suggestion!
Thats not an issue as long as you %KEY matches from the tables no matter what your order in the tbale is.
An intervalmatch isn't needed else the mapping-table could be expanded to all necessary combinations of values, for example with a while-loop, like here simplified:
mapping load key & '|' & (from + iterno() - 1) as lookup, from as return
from X while from + iterno() - 1 <= from + 10;
This means the essential part is to prepare the mapping-table and depending on the real complexity the task might be distributed to n steps in preparing the data with n conditions/transformations and with n forward/backward loops and so on ... By larger data-sets it might also useful to filter in beforehand the mapping and the facts per exists() against each other to exclude those ones which don't need an adjustment.
@SonPhanHumanIT It seems complex transformation. It should be done in ETL ideally if data volumes are huge. If data volumes are not huge then you can try something below. It does cross join on Key,Date & Time which is heavy operation
Data:
LOAD *,
KEY & '|' & Date & '|' & Time as _key,
Time(Time - MakeTime(0, 0, 10)) AS TimeFrom
INLINE [
KEY, Date, Time
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
Join(Data)
//Mapping table
LOAD * INLINE [
KEY, Date,Time_To_Map
1, 22.05.2024,13:10:41
1, 22.05.2024,13:11:40
1, 22.05.2024,13:11:37
];
T1:
Load *,
if(floor(interval(Time_To_Map-TimeFrom,'hh:mm:ss'))>=0 and
second(interval(Time_To_Map-TimeFrom,'hh:mm:ss'))<=10,1,0) as Flag1
Resident Data;
Drop Table Data;
T2:
Load *,
if(KEY = Previous(KEY) and Date = Previous(Date) and Time_To_Map=Previous(Time_To_Map)
and Previous(Flag1)=1,0,Flag1) as Flag
Resident T1
Where Flag1=1
Order by KEY,Date,Time_To_Map,Time;
Drop Table T1;
Final:
NoConcatenate
Load *
Resident T2
where Flag=1;
Drop Table T2;
// Append not existed keys from original table to list not mapped records.
Concatenate(Final)
LOAD *
where not Exists(_key);
LOAD *,KEY & '|' & Date & '|' & Time as _key
INLINE [
KEY, Date, Time
1, 22.05.2024, 13:11:45
1, 22.05.2024, 13:11:41
1, 22.05.2024, 13:11:38
];
Drop Fields _key,Flag,Flag1,TimeFrom;