count() and key values - Qlik Community

Report Inappropriate Content · ‎2010-02-17

Hi!

I'm trying to wrap my head around how QlikView does certain calculations.

Let's say I have three tables, Customers, Transactions and Incidents like so:

Customers
---------
CustomerID
Name
CreatedDate

Transactions
---------
TransactionID
CustomerID
Amount

Incidents
---------
IncidentID
CustomerID
Description

Now, if I wanted to show a bar chart with two expressions, number of customers who were created in the past year and, of those, how many actually have any transactions. How would I go about this?

The expression I originally came up with for the first one was:

=count( if(date(CreatedDate) > (today()-365), CustomerID))

However, I don't understand how QlikView can know which CustomerID I want. That key is only unique in the Customers table, how do I know it's not counting the amount of rows in the Transactions or Incidents table with that key? Or is it incorrect to use aggregation functions with a key field? For example, the following expression returns different values:

=count( if(date(CreatedDate) > (today()-365), 1))

Ideally I would like some way of specifying which table it should be summing from. Is this possible, or have I misunderstood something completely?

The dimension is monthly so there are 12 bars in the chart per expression.

Thanks in advance!

Report Inappropriate Content · ‎2010-02-17

Hi The CustomerID is a Key Field, and it is best practice not to use the function count on key fields.

If you take the field CustomerID and look at the values it will give you a distinct list of the three instances of CustomerID however you may only want the values that exist in the customers table. I would suggest the following. Remember each table join in QV is a double outer join.

In the script I would add to the customers table a number of 'counter' fields. Some people add counter fields to all tables however I only add them where necessary.

Customers table i would add the line

'1' as Customer_Count // you would use the function sum on this field

On the transactions table I would add the line

CustomerID as Customer_Counta // the 'a' represents that you would use a count function and you should use Distinct in the function as below

count(Distinct Customer_Counta)

You are on the right track with the IF statements, however please check you date formats as this can cause problems. Depending on the source data a tried and tested way of making sure a date is a TRUe Date I would usually perform the below transformation in the script.

DATE(floor(CreatedDate),'DD/MM/YYYY') as CreatedDate

The Floor command will remove a time if there is one and turns it to a number the date function then makes it a TRUE date.

Hope this is of help,

Neil

Report Inappropriate Content · ‎2010-02-17

I managed to get it working by introducing a reference to a dummy calculation referring to a field in the table I was interested in, e.g. AND transactionID > 0.

Still, it'd be great to be able to specify which table should be used in aggregations.

johnw · ‎2010-02-17

Really, there's only one CustomerID, but in any case, QlikView has no idea from which table you want it to count. In fact, different versions of QlikView have made different assumptions, and thus returned different results. Never ever count(KeyField).

However, I think it's going to far to say "it is best practice to not use the function count on key fields."

How am I being consistent? Because you can do a count(distinct KeyField). That should always return the right answer, and if QlikView is smart, it should be able to return the result very efficiently. So never count(KeyField), but doing a count(distinct KeyField) is OK to the best of my knowledge.

Other solutions, such as creating a duplicate key field, or a counter on the table you meant, are certainly workable. But I don't see any reason for the additional complexity. There might BE a reason; I just don't know of one.

As far as telling QlikView which table to use in aggregations, that's just not how it works, but you'll get the hang of how it does work. QlikView is creating an internally consistent set of ALL of your selected data, regardless of table, and executing your expressions on that entire data set. The solution in your example is not to specify "From the Customers Table", but probably rather to count distinct values of the CustomerID.

I wouldn't trust "AND transactionID > 0". All that tells QlikView is you only want customers with transactions. It doesn't tell QlikView to count from the transaction table. You might end up with the answer you want (and apparently you did)... or you might not (or it might change in future versions, since you're asking for a value without a strict definition).

Report Inappropriate Content · ‎2010-02-18

Thanks for the clarification. I forgot to mention that in my previous comment. What I ended up with was something like

=count(DISTINCT if(date(CreatedDate) > date(MonthStart(today()-365)) and TransactionID > 0, CustomerID))

Would this kind of use of Count() be OK for the future?

johnw · ‎2010-02-18

You know, I'm not certain enough that I'm really comfortable saying "yes".

First, what's your requirement? Are you trying to count the number of new customers (created within the past year) that have transactions? That's what that expression looks like to me. And it should do the trick. But... eh, it just makes me a little nervous for some reason.

So I apologize, but let me change my mind a bit on what I said yesterday, at least for this specific case. I do think that in this case, it may be worth adding a new field. That will let you be more certain you're OK, and it will also speed up your charts. I'd add this field to your Customer table.

if(CreatedDate>addmonths(monthstart(today()),-12),CustomerID) as NewCustomerID,

If you just want a count of new customers, you can then just count(NewCustomerID). But it looks like you want a count of customers with transactions with a positive ID. It's probably more efficient to do that with set analysis than an IF. So I'm thinking this:

count({<TransactionID*={">0"}>} NewCustomerID)

But I'm not sure if I have that exactly right. Hard to test without your data set, but hopefully that's enough.

Mind you, probably your approach (fixed for addmonths()) would produce the same answer. You don't really need the new field. But I think it'll be faster. And if performance is a BIG issue, you could instead do this:

if(CreatedDate>addmonths(monthstart(today()),-12),1) as NewCustomerCount,

Then experiment with both of these:

count({<TransactionID*={">0"}>} NewCustomerCount)
sum({<TransactionID*={">0"}>} NewCustomerCount)

I can't remember which was which, but it was something like sum() was faster in 8.5, but count() was faster in 9.0. Can't remember for sure, so worth checking both, at least if performance is a serious concern. I think having NewCustomerID is more flexible, though, as you can use it in a chart, for instance.

hectorgarcia · ‎2010-02-18

I'm agree with john, what I do in this cases is to flag the different tables in order to prevent this potencial incosistency, in your example if you have flagged the orders table with 1 as flagorders you can count distinct customerid witj orders (the key field) that has the flagorders=1

something like count(${<flagorders={1}>} distinct CusomerId)

Hector