Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
HillVi
Partner - Contributor II
Partner - Contributor II

Time Average - dealing with irregular intervals

Hi everyone!

I am trying to calculate average value per hour. However, the data I have is in irregular intervals. For instance:

Timestamp Value
2022-05-25 00:35 10
2022-05-25 00:36 10
2022-05-25 00:37 10
2022-05-25 00:38 0
2022-05-25 00:48 0
2022-05-25 00:58 0
2022-05-25 01:01 10

 

Just using avg(Value) for hour 00 here results in 5 as the sum is divided by number of rows. But the intervals are irregular thus the last value (0) applies until next value (or until next hour). This issue obviously affects hourly averages and makes them incorrect. What I am in need of is some kind of time average, where time is taken into consideration.

Any help or input would be very much appreciated!

Labels (1)
1 Solution

Accepted Solutions
MarcoWedel

maybe one solution could be to calculate and split intervals based on the next timestamp or full hour, e.g. like this:

table1:
LOAD Timestamp(Timestamp,'YYYY-MM-DD hh:mm') as Timestamp,
     Value
Inline [
Timestamp,	Value
2022-05-25 00:35,	10
2022-05-25 00:36,	10
2022-05-25 00:37,	10
2022-05-25 00:38,	0
2022-05-25 00:48,	0
2022-05-25 00:58,	0
2022-05-25 01:01,	10
2022-05-25 01:10,	5
2022-05-25 01:50,	15
2022-05-25 02:30,	2
2022-05-25 02:55,	8
2022-05-25 03:15,	16
2022-05-25 04:00,	20
2022-05-25 08:10,	10
2022-05-25 12:30,	0
2022-05-25 14:50,	5
2022-05-25 14:59,	15
2022-05-25 15:00,	20
2022-05-25 15:01,	25
2022-05-25 15:50,	30
2022-05-25 23:30,	10
2022-05-26 05:00,	5
];

table2:
LOAD *,
     Timestamp(Timestamp#(Timestamp(Start,'YYYY-MM-DD hh'),'YYYY-MM-DD hh'),'YYYY-MM-DD hh:mm') as TimestampHour,
     Interval#(Interval(End-Start,'hh:mm'),'hh:mm') as Duration;
LOAD Timestamp,
     Value,
     Timestamp#(Timestamp(RangeMax(Start,Floor(Start,1/24)+(IterNo()-1)/24),'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm')	as Start,
     Timestamp#(Timestamp(RangeMin(End,  Floor(Start,1/24)+ IterNo()   /24),'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm')	as End
While Timestamp#(Timestamp(Floor(Start,1/24)+(IterNo()-1)/24,'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm') < End;
LOAD *,
     Timestamp as Start,
     Alt(Previous(Timestamp),Timestamp) as End
Resident table1
Order By Timestamp desc;

DROP Table table1;

 

your time weighted average then could be something like:

Sum(Value*Duration)/Sum(Duration)

using TimestampHour as your dimension

hope this helps

Marco

 

View solution in original post

3 Replies
Or
MVP
MVP

Normally, you could just use a weighted average by subtracting the end time from the start time. However, in this case, you don't actually have that information, as the "length" of a row is determined by the following row (if one exists), and there is nothing to tell us how long the last row is. 

I'd suggest having a look at this document: https://community.qlik.com/t5/QlikView-Documents/Generating-Missing-Data-In-QlikView/ta-p/1491394

Specifically, pages 7 and 8 (how to populate a sparsely populated field) - doing so will create a scenario where each line refers to a single one-minute period and you can then use a straight average.

If there's not practical, you could try calculating your own weighted average using something like (Below(Timestamp) - Timestamp) * Value

But this would require some specific aggr() for the below() since you'd need to wrap this entire thing in a sum, and it would also require you handle the last row individually since there are no values below it.

MarcoWedel

maybe one solution could be to calculate and split intervals based on the next timestamp or full hour, e.g. like this:

table1:
LOAD Timestamp(Timestamp,'YYYY-MM-DD hh:mm') as Timestamp,
     Value
Inline [
Timestamp,	Value
2022-05-25 00:35,	10
2022-05-25 00:36,	10
2022-05-25 00:37,	10
2022-05-25 00:38,	0
2022-05-25 00:48,	0
2022-05-25 00:58,	0
2022-05-25 01:01,	10
2022-05-25 01:10,	5
2022-05-25 01:50,	15
2022-05-25 02:30,	2
2022-05-25 02:55,	8
2022-05-25 03:15,	16
2022-05-25 04:00,	20
2022-05-25 08:10,	10
2022-05-25 12:30,	0
2022-05-25 14:50,	5
2022-05-25 14:59,	15
2022-05-25 15:00,	20
2022-05-25 15:01,	25
2022-05-25 15:50,	30
2022-05-25 23:30,	10
2022-05-26 05:00,	5
];

table2:
LOAD *,
     Timestamp(Timestamp#(Timestamp(Start,'YYYY-MM-DD hh'),'YYYY-MM-DD hh'),'YYYY-MM-DD hh:mm') as TimestampHour,
     Interval#(Interval(End-Start,'hh:mm'),'hh:mm') as Duration;
LOAD Timestamp,
     Value,
     Timestamp#(Timestamp(RangeMax(Start,Floor(Start,1/24)+(IterNo()-1)/24),'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm')	as Start,
     Timestamp#(Timestamp(RangeMin(End,  Floor(Start,1/24)+ IterNo()   /24),'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm')	as End
While Timestamp#(Timestamp(Floor(Start,1/24)+(IterNo()-1)/24,'YYYY-MM-DD hh:mm'),'YYYY-MM-DD hh:mm') < End;
LOAD *,
     Timestamp as Start,
     Alt(Previous(Timestamp),Timestamp) as End
Resident table1
Order By Timestamp desc;

DROP Table table1;

 

your time weighted average then could be something like:

Sum(Value*Duration)/Sum(Duration)

using TimestampHour as your dimension

hope this helps

Marco

 

HillVi
Partner - Contributor II
Partner - Contributor II
Author

Thanks Marco and Or for you valuable input! I wouldn't have figured it out otherwise.

It was quite testing and took a few hours, but I finally managed to overcome the issue by calculating based on duration/proportion of an hour.