Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
acme
Contributor III
Contributor III

Using of Last Values in null

Hi,

I need to use the last value in the row, where it is null.

I have 2 data sources, with irregular granularity. I tried to create 15 mins time intervals in order to make the calculations. 

However, whenever if there is missing data in one source, the calculation failed. I tried to integrate peek() function in to my script but it doesn't work.

Another problem is being missing value in the out of time intervals. For example, I am going to make shiftly-wised analysis (00:00-08:00, 08:00-16:00 and 16:00-00:00), so I need a filter for this. However, if the missing value is in the beginning of the selected interval, seems like peek() won't work.

Here is the reproducable code,

[Energy]:
Load
Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss') as newDate,
time(Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss'),'mm') as Bucket,
Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss') as TimeBucket,
Energy

Inline [
Timestamp, Energy,
01.01.2023 00:06:32, 50,
01.01.2023 00:13:23, 60,
01.01.2023 00:16:43, 75,
01.01.2023 00:23:54, 65,
01.01.2023 00:35:54, 68,
01.01.2023 00:46:33, 60,
01.01.2023 00:54:23, 55,
01.01.2023 01:16:21, 40,
01.01.2023 01:35:43, 20];

[Production]:
Load
Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss') as newDate,
time(Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss'),'mm') as Bucket,
Date(Floor(Timestamp(Timestamp#(Timestamp,'DD.MM.YYYY hh:mm:ss'),'DD.MM.YYYY hh:mm:ss'), .25/24),'DD.MM.YYYY hh:mm:ss') as TimeBucket,
CumulativeQty

Inline [
Timestamp, CumulativeQty,
31.12.2022 23:57:21, 89
01.01.2023 00:05:32, 100,
01.01.2023 00:14:23, 120,
01.01.2023 00:19:43, 130,
01.01.2023 00:25:54, 135,
01.01.2023 00:48:33, 150,
01.01.2023 00:55:23, 159,
01.01.2023 01:18:21, 168,
01.01.2023 01:34:43, 170];

[New1]:
Load
Bucket,
TimeBucket,
Energy
Resident [Energy];
Drop Table [Energy];


[New2]:
Load
Bucket,
TimeBucket,
(CumulativeQty),
If((isnull(CumulativeQty) or len(CumulativeQty)=0), Peek(CumulativeQty) , (CumulativeQty)) as CumulativeQty_New
Resident [Production];
Drop Table [Production];

Labels (3)
5 Replies
Oliver_F
Partner - Creator III
Partner - Creator III

Not exactly sure what you want to do. If I get it right your problem occurs when your very first entry is null because thatn peek will not find a value.

So the question is not technical it is practical: What number should be shown if there is not data available? If you just want to use 0 then you can try this

If((isnull(CumulativeQty) or len(CumulativeQty)=0), coalesce(Peek(CumulativeQty),0) , (CumulativeQty)) as CumulativeQty_New

or

If((isnull(CumulativeQty) or len(CumulativeQty)=0), rangemax(Peek(CumulativeQty),0) , (CumulativeQty)) as CumulativeQty_New

 

Furthermore you peek function is missing single quotes. it should be Peek('CumulativeQty') instead of Peek(CumulativeQty).

 

Last but not least you said something vague about a shit analysis. Have you tried using intervalmatch? Can't find this in your script.

acme
Contributor III
Contributor III
Author

Okay, let me explain a little bit detailed.

I have two datasets, one is energy, this data set includes the energy consumption. The sampling rate is irregular, can be changed 5 to 10 mins. So I need to create timebuckets for matching. The consumption value is already difference of cumulatives. So, I only need to have sum of the related timebuckets.

Second data set is the production quantities. It also has irregular sampling rate. I created timebuckets, but I only need the latest value of the related timebucket, to complete the calculation. but when there is no production, there won't be any data. There we have problem. I try to create timebuckets and match it (left join to energy dataset) with the energy dataset, in order to catch and filter the energy consumption when there is no production, and energy per product KPIs and so on.

 

for example, when you run the script, you will see that, there is no data for 00:30 timebucket. To complete the calculate the difference, I need to put 135, inside of the 00:30 timebucket. means that, if there is no data I need to put the latest value.

 

Hope that I clarified.

 

acme_0-1672997266507.png

 

Oliver_F
Partner - Creator III
Partner - Creator III

Why don't you just use sum(Energy) in the first expression?

 

Second expression can be coalesce(max(CumulativeQty_New),above(max(CumulativeQty_New))) - this should solve the problem in that case. but you might still have problems if there are multiple null values in a row.

acme
Contributor III
Contributor III
Author

thanks for reply, but still does not work.

On the other hand, this is just a simple dataset for explaining and to be reproduceable for the contributors. Unfortunately in real case, I have more than 1 empty rows.

acme_0-1673003123694.png

 

Oliver_F
Partner - Creator III
Partner - Creator III

Hi,

I copied your script and tested it. for me it works.

coalesce.png