Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
morenoju
Partner - Specialist
Partner - Specialist

Issue with nulls when using sum functions to calculate an average

Hi all,

I'm using this function to calculate a speed average:

sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)
/
sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length)
/
count(Date_A)

My roads are divided in segments ("links") of different lengths. I decided to use the sum function because I want to calculate a weighted average. Example: if for 8 miles the speed was 50 and for 2 miles it was 70, the average should not be 60, but 54.

Users can pick more than one date to calculate these average, therefore the / count (Date_A) at the end.

If I pick just one date and there are missing values (nulls) for some of the hours, it is not a problem, I simply get a gap in the chart:

 image.png

 

However, if I pick two or more dates, the chart makes an average considering the null values as zero.

That's why picking 10/11 and 10/12, at 4:30 pm where the value should be 24 mph (the only value available is the 10/12 4:30 pm record which is 24) the graph is showing 24 + 0 / 2 = 12.

image.png

Do you have any idea on what to do so Qlik interprets the null values correctly?

I tried to use:

sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)
/
sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length)
/
count({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)

That works when I have only one "link" (with one length) selected, but the numbers don't match for selection of multiple links.

Am I using a wrong approach to the problem?

Thanks much!

Labels (4)
1 Solution

Accepted Solutions
morenoju
Partner - Specialist
Partner - Specialist
Author

Hi Sunny, I have link_date in the x-axis and then Date_A, Date_B and Date_C to represent three different group of days at the same time for comparisons.

I think the key is using an aggregation. I'm now calculating first the weighted averages using the sum (length*speed) / sum (length) and then doing an additional average on link_date and link_time_15min:

avg(aggr(sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)/sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length),link_date,link_time_15min))

I think this gives the correct result even for those dates with nulls. I'll test further and post an update later.

Thanks!

View solution in original post

6 Replies
sunny_talwar

How about this

sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)
/
sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length)
/
count({<length *= {"*"}, speed_mph *= {"*"}>}Date_A)

morenoju
Partner - Specialist
Partner - Specialist
Author

Thanks Sunny, but still same result as dividing by count (Date_A).

I'm thinking of how I calculate this things manually: I first work with individual links. I calculate the average of the speed_mph for 4:00, 4:15, 4:30, etc. for the selected days.

Once I have the result, I do the weighted average:

((avg_speed1*length1)+(avg_speed2*length2))/(length1 + length2)

I'm looking at how to do that in Qlik, maybe with aggr, but so far no luck.

sunny_talwar

Why are you counting Date_A here and not link_date here?

morenoju
Partner - Specialist
Partner - Specialist
Author

Hi Sunny, I have link_date in the x-axis and then Date_A, Date_B and Date_C to represent three different group of days at the same time for comparisons.

I think the key is using an aggregation. I'm now calculating first the weighted averages using the sum (length*speed) / sum (length) and then doing an additional average on link_date and link_time_15min:

avg(aggr(sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length*speed_mph)/sum({<start_timestamp=,link_date = p(Date_A),link_hour=p(hour_A)>}length),link_date,link_time_15min))

I think this gives the correct result even for those dates with nulls. I'll test further and post an update later.

Thanks!

sunny_talwar

Sounds good

morenoju
Partner - Specialist
Partner - Specialist
Author

I tested it and the new expression handles the nulls correctly!

Regards