Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I have a problem with calculate a daily average.
A model contains a daily table that shows the status of each product, day and store. It is a table that contains 50M rows and the number increases every day. I need to build measures that show the daily average of products that are in certain statuses (there are also sub-status). I give you a simple example.
At first I calculated it in another table in the script:
Summary:
Load
1 as TypeCode,
Date,
Count(ProductID) as DailyCount,
...
From [table]
Where StatusID = 1
Group By Date;
Concatenate
Load
2 as TypeCode,
Date,
Count(ProductID) as DailyCount,
...
From [table]
Where StatusID = 7 or StatusID = 8
Group By Date;
In the UI sheet I used the measure Avg({<TypeCode={[relevant code]}>}DailyCount) and I get a value that seem correct.
The problem - table like that takes a lot of time and resources (50M rows each time).
So I tried to calculate the measure without create this summary table.
Avg(Aggr(Count(ProductID),Date))
In this way I get lower and incorrect values, even though the calculation should be the same.
What am I missing here?
I think one of the reasons is that there are dates until the end of the year and not until today.
Thanks for the helpers.
Hi @Amit_B
Avoiding using the Avg function is often a good idea, as you can't be entirely sure what it is doing. If you do the divide yourself you can see both the total value and the thing it is being divided by to check you get what you want.
You can just load from the table once and set the TypeCode, like this:
LOAD
if(StatusID = 1, 1, if(StatusID = 7 or StatusID = 8, 2, 3)) as TypeCode,
The aggregation by Date is also not required here.
With the count of ProductID, you will find adding a value in a table, rather than counting field values will be more efficient. If ProductID is a foreign key and there will always be an entry in the dimension table adding a counter to the Product table will speed things up:
LOAD
1 as ProductCount,
ProductID,
Your total number of products over all dates will be:
sum(aggr(sum(ProductCount), Date))
And that can be divided by the total number of dates where there is one or more products:
count(DISTINCT {<ProductCount*={1}>}Date)
Or, if you want to divide by the number of days from the first to the last date where there is a product:
(max({<ProductCount*={1}>}Date) - min({<ProductCount*={1}>}Date))+1
You may require a TOTAL inside the min and the max, if you don't want the dimension to be taken into account.
By breaking it up you can check both of the numbers before you do one over the other.
Hope that something in that is useful to you.
Cheers,
Steve
Thanks for the detailed answer! I will check it.