My latest blog post was on joins in QlikView (see To Join or not to Join). In it I claimed that you should avoid making joins in the QlikView script, if possible. This blog post is about a function that can help you avoid joins. It is about the function Applymap().
It is, in fact, one of my favorite functions and I do not seem to be alone in this choice. Several of the QlikView developers who regularly write on QlikCommunity seem to share this preference and have written about this function, e.g., QlikView Notes, Quick Intelligence and QVApps.
So, what does the function do? Well, basically it is just a lookup function – it takes one value as input, checks if this value exists in a mapping table and returns the corresponding value from the second column in the mapping table. Think of it as a translation function where the translation is fetched from a pre-defined, two-column table. A simple translation function, but you can still do a lot with it...
What has this got to do with joins? Everything. A very common situation in data modeling is that you need to get a single field value (per record) from a different table than the current one. Then the standard way to do this is to use a join. However, in QlikView you can – and should – use Applymap() instead. The only time that you cannot use Applymap() instead of a join is if you want to fetch more than one single corresponding value (record) from the second table.
Let's look at an example of how to use Applymap(): In an ERP system, the customer table is the table where all information about the customer is stored, e.g. the name and country of the customer. Further, it could be that you need the customer’s country when making a calculation in the order table. This is a field that you don't have in the order table. In SQL you would join the two tables to get country into the order table.
In the QlikView script you would instead first define the mapping table that maps a customer ID to a customer country as follows:
MapCustomerIDtoCountry: Mapping LoadCustomerID, Country From Customers ;
The second step is to use this information when reading the order table, such as:
Orders: Load *,
ApplyMap('MapCustomerIDtoCountry', CustomerID, null()) as Country From Orders ;
The "null()" as third parameter of the function defines what it should return when the customer ID isn't found in the mapping table.
And with this, you have joined the field Country with the order table – without using a join. And you have done it faster and with less chance of errors. Bottom line: Whenever you know that you just want to fetch one single value per record – don’t join. Use Applymap() instead.
Is that many to many columns, or lots of one to one relationships, or one column key but many columns to join?
If it is many to many then you will need to build a composite key (A & ':' & B & ':' & C as Key,) in order to do an ApplyMap. With a JOIN you will not need to do this, but multiple columns will make the chances of a broken join higher. There are still lots of reasons to use ApplyMap here though.
If it is lots of one to one relationships using ApplyMap for each of them will definitely be a good thing, as it will keep your data model tidy.
If it is one column in the key, but many columns to join, then it is a trade off. The first thing to check is the granularity of the two tables, if there are duplicates in the table you are bringing in for a single key it must be an association (i.e. load both tables separately - no JOIN or ApplyMap). If that is not a problem it's a case of how many columns you are bringing in, two or three I would probably do separate ApplyMap statements for each column, beyond that I would generally go for an association. JOIN I would only use rarely, when a preceding load or RESIDENT required all fields to be in the same table.
You need to consider which result you need in the scenario you describe. If you use ApplyMap, you will get one (and only one..) country "mapped" to you customerID. Which on that will be mapped, will depend on the order of the Mapping table.
If you use JOIN, then you will get multiple rows in your source table if you have more than one country record for you your customer, so that will give you wrong data in your source table.
The only workaround, will be to have more than the CustomerID to define the country to be mapped. Basically you need to know when a given customerID should be mapped to country A and when it should be mapped to country B because Qlik can't guess it :-).