Field selections AND-mode for Market Basket Analysis
Field selections AND-mode for Market Basket Analysis.qvf
Qlik Sense is an awesome tool when it comes to Data Discovery. However, in some situations there is no straightforward solution and one must perform a little magic with associative data model.
One such situation is creating Market Basket Analysis. Challenge is to enable AND functionality for user selections instead of the standard OR. What do I mean?
Let’s use a simple sales data example. A customer buys either one product, or two or more. When we load all data it’s easy to filter orders by products.
When one product is selected, we get something like this:
When two products are selected, we get:
Selecting 2 or more products will result in logical OR condition, so we get all sales records with first or second product.
Data analysts working with similar (albeit much more complex) sales data are often interested in combination of products on sales orders. They need a logical AND selection to get only those records where first as well as the second product was sold.
How to achieve this behavior in Qlik Sense?
Starting is quite easy and without any need for scripting. Using the incredible Aggr() function we can create a calculated dimension, which will count all different products sold on each order.
When we place this dimension into a bar chart, we get a nice overview. Moreover, selecting a particular value will give us all orders where that particular count of distinct products happened.
This simple approach can help to reveal business insights which would remain hidden with standard selections in product dimension.
But, let’s go a bit further. What if we want to start from a specific product and see all different sales combinations that happened?
Field Selection AND-mode
We’ll have to do a bit of scripting, but all operations will be quite basic. There are two main approaches I use.
First approach is really simple, flexible and great for data discovery, because all associations are nicely revealed with Qlik’s green, white and grey. One drawback is that the resulting table can grow to massive proportions with really large datasets.
1. Load all orders and products again into a separate table and name the product field differently
2. Join the same table to the previous one and repeat it as many times as many AND selections you want. Beware that this is a cross join of products on order so think carefully about the scale of data and go step by step to see the impact on memory. I join two times in this example.
3. It’s good to clean the resulting table a bit with a few conditional operations in order to dispose of irrelevant or duplicate combinations (example script is in the attached document)
4. (optional) If we use IDs instead of names in previous scripts (which is recommended), then we load the product dimension (along with some categories and types for more analytical options) to a separate table for each AND field.
Now we can filter Orders by product combinations and also see the next associative options:
Second Approach and it's variation using SET analysis
This one is much less memory demanding, but it doesn’t provide such a nice associative experience. It’s used when working with really large datasets (100s millions of rows) along with small number of distinct entities (e.g. up to 10 products on one order), or when the sequence of entities is relevant (accounting, production, service desk etc.).
1. Concatenate all different products on one order to one line
2. Use SubField() function to put each product to a separate field. (Create this script with a FOR...Each loop if the maximum number of distinct products on order is not known beforehand.)
3. (optional) As with the previous approach it's recommended to use IDs and load separate product dimensions for each AND field.
If all scripts were executed successfully we get the desired AND-mode for selections in both cases.
All would be well, but there is one more requirement which is quite common – AND NOT mode. First approach will not help us with this but nothing is ever lost with Qlik. We can utilize variables and expression SET analysis along with the second approach. This basically creates a third approach > dynamic search in a list.
1. Create a concatenated field for all distinct products on order as with the second approach
2. Load product dimension again into separate tables. In our example we will have a selection for initial product(s), AND product(s) and AND NOT products.
3. Create variables that will be used to construct the SET analysis in expression.
- for initial products
- for AND products
- for AND-NOT products
4. Finally put all variables into SET analysis of the expression used in the chart
This will enable AND-mode as well as AND-NOT-mode in selections.
Check the attached file to see all 3 approaches in action. I've been successfully using these approaches in my projects, but I’m looking forward to hearing any comments and suggestions.
PS: Thanks to Mária Šándorová ( @JaMajka1 ) for suggestions.