6 Replies Latest reply: Jan 6, 2011 8:23 AM by Mads Grathe

# Calculated frequency distribution / histogram chart

Hi guys

Since you guys have been so effective in helping me out the last time around, I thought I'd try with yet another qlikview problem I'm having...

I'm trying to do a calculated frequency or histogram chart to see the distribution of spending among my users.

My data is organized into the following rows: Date, UserID, LoginDays, Sales

So what I want to do is a frequency distribution of spending (sum of Sales) within a certain time period and only for users with more than a certain number of LoginDays. I'm figuring I need to use Set Analysis to select these users out of the list. And following other discussions on frequency distribution charts in the forums, I'm figuring I need to use the class function.

My problem is that I can't seem to get these functions to work together. Here's what I'm doing as my dimension:

`class(sum(Sales),3)`

And here's what I have as my expression:

`count({<LoginDays={'>4'}>} distinct UserID)`

But I'm getting an "// Error in calculated dimension" message. Any ideas?

Also, is it possible to have a dynamic class interval depending on the selected number of dates/weeks/months etc.? It makes sense to have a larger interval in the class expression the larger the sum of sales... Is this possible?

Thank you in advance for any advice you can give me.

• ###### Calculated frequency distribution / histogram chart

The expression you're using looks OK to me. Seems like the only problem is the dimension.

You can't use an aggregation expression like sum() in a calculated dimension without using aggr() to tell it what dimensions to use. Think of it this way, if it literally did what you told it, it would sum up all sales in the entire system and classify it. That would only give you a single value. You won't get much of a histogram out of a single value.

You probably want to classify the sum of sales per user? So, for instance, how many users had total sales between 9 and 12? I believe you'd then want this as your dimension:

aggr(class(sum(Sales),3),UserID)

Well, except that you also only want to count users with more than 4 login days. So maybe this:

aggr(class(sum({<LoginDays={'>4'}>} Sales),3),UserID)

I also don't think there would be any problem making a dynamic class interval. Maybe something like this if you want a class width of 5 times the number of selected days.

aggr(class(sum({<LoginDays={'>4'}>} Sales,5*(max(total Date)-min(total Date)+1)),UserID)

Or you could just let the user set the class width, such as with a variable vClassWidth set by a slider:

aggr(class(sum({<LoginDays={'>4'}>} Sales),vClassWidth),UserID)

• ###### Calculated frequency distribution / histogram chart

Hi John

Happy new year and thank you very much for that very explicit answer.

However, I'm still having a bit of problems. Your second possibility works fairly well and is also as far as i got on my own after having researched the matter a little further.

The dynamic class interval solutions aren't working however. What does my expression need to be in that scenario? Currently it's set to:

`count({<LoginDays={'>4'}>} distinct UserID)`

But I get an error message if i try and use one of your two dynamic class interval solutions. Also, QlikView doesn't seem to recognize the vClassWidth command (it shows up as red in the Edit Expression field...)

Thanks again.

• ###### Calculated frequency distribution / histogram chart

The vClassWidth isn't a command, but a variable. There are several ways to add a variable, such as Settings -> Document Properties -> Variables -> New. Once you've added the variable, you can then create a slider object for it, probably with a min, max and static step.

I don't think you should need to change the expression at all for the dynamic class widths, so I'm not sure what's going wrong, though everything I posted was untested, so it's probably no surprise that there were problems.

If you want to post an example, I can try to get the two different dynamic width solutions working, or if you want some other sort of dynamic width, just let me know how you want it to behave so that I'm solving the right problem.

• ###### Calculated frequency distribution / histogram chart

Hi John

Thank you so much for your answer. I tried your idea with the custom vClassWidth variable, and it worked beautifully :-)

However, I'm presented with a few new issues...

First of all, it seems my graph has a tendency to not present intervals on the x-axis in an ordered fashion. For instance, if I select an interval of say 3, it shows starts with (18<=x<21), (6<=x<9), (3<=x<6), (27<=x<30), (15<=x<18), (12<=x<15) - Clearly it should start with (3<=x<6), (6<=x<9) and so forth...

Secondly, what would be the way to exclude the 0 values for Sales? If I include the users with no sales, I'll get a large reading on the zero mark, which makes sense, but makes my graph less usable if I want to be able to see the distribution of Sales per user. So I tried doing the following, which seems to work, but I want to make sure that I've understood things correctly:

`aggr(class(sum({<LoginDays={'>4'},Sales={'>0'}>} Sales),vClassWidth),UserID)`

Is this the correct way to go about it? Or is there a simple graph setting I should/could use instead?

I would really like to post an example, but since I'm extracting large amounts of data from a database and into QlikView, I don't know how to do a qvw file with a limited number of data points that I could post to show you...

Thanks for all your help, it's really helping me to understand the possibilities of this powerful tool.

• ###### Calculated frequency distribution / histogram chart

The class() function has two values, the text value you see, and an underlying numeric value. So you should be able to sort the column by numeric value, even though it is displaying text. It is like a date in that regard.

As for excluding users with no sales, you have the right idea and have implemented part of the solution. Let's take a look at what you did:

aggr(class(sum({<LoginDays={">4"},Sales={">0"}>} Sales),vClassWidth),UserID)

Note that the modifier is applying to the sum of sales. On the surface, it might seem that excluding 0 sales from a sum would do nothing. But what it's really doing is excluding users with 0 sales from your classifications since those users should end up with NULL sum of sales instead of 0. So if the ONLY users with sales in the 0-3 range are users with no sales at all, don't even create that classification. Since you say it is working for you, I'm going to guess that you have no users in the lowest classification except when they have no sales at all.

However, if there ARE any users with low enough sales to fit in the first classification, I believe it will then count ALL users in that classification unless you also modify the count(). So to be certain that first giant bar doesn't reappear, we also need to tell it to exclude those users from the count() as well:

count({<LoginDays={">4"},Sales={">0"}>} distinct UserID)

Between the two expressions, what you are effectively telling the system is "select all values of LoginDays > 4, then select all values of Sales >0, then classify the remaining users by total sales, then count the users in each class."

At least I THINK that's right.

Oh! Literals go in single quotes, search expressions in double quotes. So when using search expressions like >4 or >0, use double quotes. Single quotes sometimes work for search expressions, but I frankly consider this a QlikView bug, because what if your field LITERALLY had a possible value of '>4' and you wanted to select that exact value? The bug prevents you from doing so, at least in some cases I remember testing a while ago. Although I don't expect it to ever be fixed due to backwards compatibility issues, if they ever DO fix this bug, it could bite you. I wouldn't take the chance. I'd just get used to using double quotes on search expressions. Expressions above corrected.

• ###### Calculated frequency distribution / histogram chart

Hi John

That's actually exactly what I figured out myself as well, with both the sorting and the excluding of 0 values (excluding them in BOTH dimension and expression) - But it's really great to get confirmation from an experienced user.

Thanks for all your help - I really appreciate it!

See you on the boards :-)

- Mads