My latest blog post was on joins in QlikView (see To Join or not to Join). In it I claimed that you should avoid making joins in the QlikView script, if possible. This blog post is about a function that can help you avoid joins. It is about the function Applymap().
It is, in fact, one of my favorite functions and I do not seem to be alone in this choice. Several of the QlikView developers who regularly write on QlikCommunity seem to share this preference and have written about this function, e.g., QlikView Notes, Quick Intelligence and QVApps.
So, what does the function do? Well, basically it is just a lookup function – it takes one value as input, checks if this value exists in a mapping table and returns the corresponding value from the second column in the mapping table. Think of it as a translation function where the translation is fetched from a pre-defined, two-column table. A simple translation function, but you can still do a lot with it...
What has this got to do with joins? Everything. A very common situation in data modeling is that you need to get a single field value (per record) from a different table than the current one. Then the standard way to do this is to use a join. However, in QlikView you can – and should – use Applymap() instead. The only time that you cannot use Applymap() instead of a join is if you want to fetch more than one single corresponding value (record) from the second table.
Let's look at an example of how to use Applymap(): In an ERP system, the customer table is the table where all information about the customer is stored, e.g. the name and country of the customer. Further, it could be that you need the customer’s country when making a calculation in the order table. This is a field that you don't have in the order table. In SQL you would join the two tables to get country into the order table.
In the QlikView script you would instead first define the mapping table that maps a customer ID to a customer country as follows:
MapCustomerIDtoCountry: Mapping LoadCustomerID, Country From Customers ;
The second step is to use this information when reading the order table, such as:
Orders: Load *,
ApplyMap('MapCustomerIDtoCountry', CustomerID, null()) as Country From Orders ;
The "null()" as third parameter of the function defines what it should return when the customer ID isn't found in the mapping table.
And with this, you have joined the field Country with the order table – without using a join. And you have done it faster and with less chance of errors. Bottom line: Whenever you know that you just want to fetch one single value per record – don’t join. Use Applymap() instead.
If one of the tables is large I would definitely prefer five Applymaps.
Also, there is always a risk with joining: Even if you think you have a one-to-one relationship between the tables so that no records will be duplicated in the join, you often in fact have a one-to-many situation so that the number of records increases. Which leads to incorrect calculations. I have seen this happen many times in real life. This is a risk you do not have with Applymap.
I had three issues at work. I had found solution but will change to mapping tables
One is tables with records included twice (the same serial number is sometimes included two or three times) in it. So I found other ways to only select the first serial. But this was a little complex. Mapping is perfect
And customer groups. I wanted the top 20 customers (made up of around 200* customer numbers) grouped by cust combined name and the rest as other Customers. Mapping is also perfect as it allows the default option (Other or whatever) if cust num is not found
Another situation. I want to replace if found in a mapping table (replacement cust ref as the cust changed their numbers to a different format so we had the old number and new number ) but keep if it does not appear in this table
So thanks for this blog post (and the other one). Mapping tables is ideal
However, since there often is more than one way to do things in QlikView, I will point at alternative solutions for the two first problems. Then it is up to you to choose which you prefer...
Duplicates: The following load will only load the first of duplicate serial numbers: Load SerialNumber, ... where not Exists(SerialNumber)
Top 20 and others: Use "Dimension limits" (only in QV11) in the chart properties to define your top 20 customers. Then you get a dynamic evaluation (evaluates after the selection) as opposed to doing it in the script which is a static evaluation.
I only have QV10 at the moment but will try your solution when we upgrade. I will also look at where non exists. Re customers they have to be grouped anyway as one top 20 customer might have a lot more than 1 cust number in the main system and there is no way to do the linking in a main system. So I group the top 20 only and include the rest in other.
Of course I should have said will change to mapping tables in a copy to see how it goes. At present my set up works well (but is not perfect and carries the duplication risk) but mapping seems a much better option (and would have initially tried this if I knew what mapping tables did. I read the manual and could not work it out) and will swap across if it is an improvement after testing.
Is there a table size at which point its better to join? I have a table with 21 million rows. right now I'm doing 5 mapping tables. Would it be better to join all these together, or does the size of the table not matter?