Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I need to use the last value in the row, where it is null.
I have 2 data sources, with irregular granularity. I tried to create 15 mins time intervals in order to make the calculations.
However, whenever if there is missing data in one source, the calculation failed. I tried to integrate peek() function in to my script but it doesn't work.
Another problem is being missing value in the out of time intervals. For example, I am going to make shiftly-wised analysis (00:00-08:00, 08:00-16:00 and 16:00-00:00), so I need a filter for this. However, if the missing value is in the beginning of the selected interval, seems like peek() won't work.
Here is the reproducable code,
[Energy]:
Load
Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss') as newDate,
time(Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss'),'mm') as Bucket,
Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss') as TimeBucket,
Energy
Inline [
Timestamp, Energy,
01.01.2023 00:06:32, 50,
01.01.2023 00:13:23, 60,
01.01.2023 00:16:43, 75,
01.01.2023 00:23:54, 65,
01.01.2023 00:35:54, 68,
01.01.2023 00:46:33, 60,
01.01.2023 00:54:23, 55,
01.01.2023 01:16:21, 40,
01.01.2023 01:35:43, 20];
[Production]:
Load
Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss') as newDate,
time(Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss'),'mm') as Bucket,
Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss') as TimeBucket,
CumulativeQty
Inline [
Timestamp, CumulativeQty,
31.12.2022 23:57:21, 89
01.01.2023 00:05:32, 100,
01.01.2023 00:14:23, 120,
01.01.2023 00:19:43, 130,
01.01.2023 00:25:54, 135,
01.01.2023 00:48:33, 150,
01.01.2023 00:55:23, 159,
01.01.2023 01:18:21, 168,
01.01.2023 01:34:43, 170];
[New1]:
Load
Bucket,
TimeBucket,
Energy
Resident [Energy];
Drop Table [Energy];
[New2]:
Load
Bucket,
TimeBucket,
(CumulativeQty),
If((isnull(CumulativeQty) or len(CumulativeQty)=0), Peek(CumulativeQty) , (CumulativeQty)) as CumulativeQty_New
Resident [Production];
Drop Table [Production];
Not exactly sure what you want to do. If I get it right your problem occurs when your very first entry is null because thatn peek will not find a value.
So the question is not technical it is practical: What number should be shown if there is not data available? If you just want to use 0 then you can try this
If((isnull(CumulativeQty) or len(CumulativeQty)=0), coalesce(Peek(CumulativeQty),0) , (CumulativeQty)) as CumulativeQty_New
or
If((isnull(CumulativeQty) or len(CumulativeQty)=0), rangemax(Peek(CumulativeQty),0) , (CumulativeQty)) as CumulativeQty_New
Furthermore you peek function is missing single quotes. it should be Peek('CumulativeQty') instead of Peek(CumulativeQty).
Last but not least you said something vague about a shit analysis. Have you tried using intervalmatch? Can't find this in your script.
Okay, let me explain a little bit detailed.
I have two datasets, one is energy, this data set includes the energy consumption. The sampling rate is irregular, can be changed 5 to 10 mins. So I need to create timebuckets for matching. The consumption value is already difference of cumulatives. So, I only need to have sum of the related timebuckets.
Second data set is the production quantities. It also has irregular sampling rate. I created timebuckets, but I only need the latest value of the related timebucket, to complete the calculation. but when there is no production, there won't be any data. There we have problem. I try to create timebuckets and match it (left join to energy dataset) with the energy dataset, in order to catch and filter the energy consumption when there is no production, and energy per product KPIs and so on.
for example, when you run the script, you will see that, there is no data for 00:30 timebucket. To complete the calculate the difference, I need to put 135, inside of the 00:30 timebucket. means that, if there is no data I need to put the latest value.
Hope that I clarified.
Why don't you just use sum(Energy) in the first expression?
Second expression can be coalesce(max(CumulativeQty_New),above(max(CumulativeQty_New))) - this should solve the problem in that case. but you might still have problems if there are multiple null values in a row.
thanks for reply, but still does not work.
On the other hand, this is just a simple dataset for explaining and to be reproduceable for the contributors. Unfortunately in real case, I have more than 1 empty rows.
Hi,
I copied your script and tested it. for me it works.