17 Replies Latest reply: May 21, 2015 10:25 AM by Yakir Manor

Running 30 day average

Hi everybody

I'm having trouble getting QlikView to do what I believe is probably a relatively simple task...

I'd like it to calculate the running 30 day average of a certain expression. In this case the average Sales per User within the last 30 days before a given date. So for any date it should calculate the average based on the preceeding 30 days.

Here's how I calculate my current average Sales per User:

`sum({<LoginDays={'>4'}>} Sales)/count({<LoginDays={'>4'}>} distinct UserID)`

I'm using set analysis because I only want to count the users who have a certain amount of LoginDays, in this case 5.

Can anyone help me do a 30 day running average based on this calculation?

Thank you very much in advance for any advice you can give me. I'm learning to use QlikView the hard way and any help is much appreciated :-)

• Running 30 day average

hei attach is an example

hope it helps you

• Running 30 day average

Hi

Thank you very much for the reply, but I'm not quite sure I understand your suggestion. Using your example I tried to apply it to my own data in the graph in question:

`sum({<LoginDays={'>4'},Date={">=\$(=max(Date)-30) <\$(=max(Date))"}>} Sales)/count({<LoginDays={'>4'},Date={">=\$(=max(Date)-30) <\$(=max(Date))"}>} distinct UserID)`

But that just seems to give me the daily Sales per User for the last 30 days of my time selection. I have Date as my dimension variable. Am I doing something wrong?

• Running 30 day average

Could I somehow use rangesum and rangecount to achieve this?

• AW:Re: Running 30 day average

Hi,

we're using something like this:

`rangeavg(above(sum(Sales),0,30))rangeavg(above(sum({\$<LoginDays={'>4'}>} Sales),0,30))`

But, I'm not sure about the set analysis with your field LoginDays... Try it first without set analysis.

- Ralf

• AW:Re: Running 30 day average

Hi Ralf

It's possible to do what you suggest both with and without set analysis - but it's not really what I was looking for.

Firstly, your solution gives me the average DAILY sales for the last 30 days. I was looking to also divide this average with the number of users, so it should also count the number of Users within that time period and divide the sales by that count.

Secondly, It's not dynamic. In the sense that it begins counting from the first Date I select as my timeinterval. When in fact it should count from that day -30 days. Do you see my point? So a sort of dynamic accumulation of Sales and dynamic count of Users and the two divided with each other. If that makes sense...

• AW:Re: Running 30 day average

An easier way to explain it would be that for any given date, I need to calculate the total Sales divided by number of distinct users within the last 30 days of that date.

• AW:Re: AW:Re: Running 30 day average

this is hard to say without knowing your data and application...

• AW:Re: Running 30 day average

Hi

Above works OK (with or without the set expression), but it only takes into account data that is in the table. If there are missing days, then above(..,30) will include more tham 30 days in the expression. It also cannot look back beyond the start of the table, if you have some other selections active (such as a Month field selection)

I expect that a better approach would be to use a date island. I have attached a demo of this which hopefully will help you.

• AW:Re: AW:Re: Running 30 day average

I have no idea right now but you could fill the empty days with Sales=0....

• AW:Re: Running 30 day average

Hi Jonathan

I think you're definately on the right track. However, I'm not quite sure what a date island is, but after looking at you're example, I'm starting to get the picture. You're pretty much spot on in the way you have the data structured, that's pretty much a simplified version of the data I'm working with. However, I don't have missing dates, so that's not a problem.

Only problem is, you're solution involves doing it at the load script level. But since it's only on the dates, it shouldn't effect load time significantly, right?

I'm going to study your file and see if I can adopt it to my data.

Thank you!

• AW:Re: Running 30 day average

Hi

Just to clarify - a date island is a date table that is not connected to any other tables in the model (hence island). It exists to be used as a chart dimension for cases such as your problem. The expressions are connected to the date dimension via the if() statements. For example, I used:

Sum({<LoginDays={'>4'}, Date=>} If(Date > diDate - vDateRange AND Date <= diDate, Sales))

For the sales over the last 30 days. The set expressions for LoginDays is per your original post, and the Date= clause is to prevent a date selection from affecting the result. If you have other date fields (such as Month, Quarter, Year) which may be involved in selections, you should add them to the set expression.

The If() statement gets Sales where trhe Date is between the dimension date (diDate) and the date 30 days before the dimension date (or whatever number of days in vDateRange).

The load time implications will be negligible, as the table is populated by a few hundred calculated dates (depending on the date range in your data). Determining the min and max dates could take a little time if your main fact table is millions of lines in size.

The other problem that this approach solves is this. Assume that you would like to see the data one month at a time. So you would have a diMonth selection active. This approach will still correctly calculate the previous vDateRange days for the first day of the month, second day and so on). Verify that by making a selection on diDate to limit the chart/table range.

A final point is that the if expressions assume the dates to be pure date values, not date/time (timestamp). If not, the last part of the If statement should read:

AND Floor(Date) <= diDate,

Jonathan

• AW:Re: Running 30 day average

Hi Jonathan

Thank you very much for clearifying the Date Island solution, I get the principle know!

However, I can't get it to function in practicality...

I used your script to add to my existing script, so that I got diDate in there.

I created a table using diDate as dimension and the following expression for the 30 day base:

`Sum({<LoginDays={'>4'}, Date=>} If(Date > diDate - vDateRange AND Date <= diDate, Sales))/Count({<LoginDays={'>4'},Date=>} Distinct If(Date > diDate - vDateRange AND Date <= diDate, emailid))`

But the graph is stil just showing the accumulated sales starting from the first date in the selection and forwards... Also, It is taking QlikView a LONG time to calculate the graph. The loading bar for the graph is really slow. Here's the result shown once the graph is done loading:

This is even though I used diDate as dimension but selected the period using the ordinary Date field in my data...

Any ideas?

• AW:Re: Running 30 day average

Mads

First the easy question - since the date selection is overridden in the expressions, it should have no effect on the graph.

I expect that the performance will be slower than normal associations, as the association is happening with the interpreted expression, rather than using QV's internal associations.

I increased the test data set to > 100,000 rows, and I did notice a drop in performance, but only if I tried to display the entire data set. If I put a month filter (using diMonth, diYear), then I noticed no impact.

It comes to how big your data set is. If it is really large, and you don't need flexibility in the number of days, you could aggregate the sales and user counts in the load script. Your table then have a Sales30Days and User30Days fields. It will increase the model size somewhat, and reduce your flexibility, but will improve the graphs performance, as you would not be aggregating or using the data island.

If the charts performs OK if limited to a month at a time, you could design your chart so that it only displays the last 30 days, or put a calc condition for Count(distinct diMonth)=1 on the chart so that it only displays when a month is selected.... etc etc

As is often the case with these type of issues, there are several options, and best depends on the specific data set/problem.

Hope that all helps

Jonathan

• AW:Re: Running 30 day average

Here is the model with 100k+ records.

Warning file is quite large (7MB)

Jonathan

• AW:Re: Running 30 day average

Hi Jonathan

I was figuring you would say that about the accumulative fields in the load script. Unfortunately that's not really an option.

1. I have a single record for every user interaction with my site, including sales, so there is really no where to include the 30 day sales count and user count.

2. I'd like to be able to do the 30 day thing with a lot of the variables in the data set, not just the sales. So for that I would have to add a lot of extra fields to my data set, making it even larger.

I really appreciate all the work you've put into helping me. I must admit I thought this would be a much easier task for QlikView. I'm a little disappointed that it's not easier to do since this 30 day measurement is something that we use quite often in my company - since we define users as someone who's interacted with the site within the last 30 days.

• Running 30 day average

Ralf

Your answer was perfect! it' worked excelent at my QV. Thanks a lot!

regards

• Re: Running 30 day average

i resolved it in a nicer way, i wanted to add a dimension called period so i can select or deselect it, that way i can use it in bookmarks, you can see my implementation, i wanted:

7 last days - 1w.

last month - 1m - 30 days.

last year - 365 days back.

you can see my solution in the attached file, you can add more periods or change the calender the same way that was suggested in:

Qlikview Financial periods (MTD, YTD, LMTD, LYYTD)

enjoy!