Some time ago I wrote a blog on Mapping functions and described how they can be used to replace or modify field values when you run the script. But how do you know when to map versus join the data in your data model? Mapping works well when you need to look up a single value in another table. For instance, you may have a products table with product data like the table below and you want to add the product category name to that table.
The product category name is in another table that looks like this:
Now you can add the ProductCategory field to the Products table by doing a join and that would work fine but you can also add the ProductCategory field by simply mapping.
Using a join:
Using a map:
Since we only want to add one value to the Products table, mapping is a safer option. With this small sample data, either will work but sometimes when you have a large data set, you have be cautious when using joins. You need to watch out for new records being added to the table as a result of the join thus potentially changing calculations.
While both a join and a map can work to combine data from two tables, in cases where only one value needs to be added, choose to map. It is an easier approach and it reduces the chance of errors being made in your data model. Now I am not saying that joins are bad and should not be used because that is not the case at all. I am simply stating that mapping should always be used versus a join when you only need one value.
gwassenaar has beaten me to posting a link to my own blog post (thanks Gysbert), but in there you will find what I like about ApplyMap, and as Rob mentions, it is the flexibility it offers. It always makes me smile when I use an ApplyMap in a MAPPING load statement or use ApplyMap as the third parameter of an ApplyMap statement. Nesting in this way is something you would not be able to do with JOIN alone.
While I normally agree with the great advice HIC & Rob W. provide and follow the general practices the Qlik blogosphere advocates, I find that the claims of Mapping being less error-prone than Joining do not fully represent the issue (this comment has nothing to do with performance, only the "error-prone" claim). From the perspective of a business analyst and/or somebody not comfortable with relation algebra (e.g.: SQL), I can see how Mapping seems better. Now, I understand that QlikView is associative (as opposed to relational) and intended more for business analysts than for developers, but from the my perspective as both a general and database developer, Mapping is simply another option with its own advantages and disadvantages. Specifically, the often-mentioned problem case of multiple rows when joining is also a problem for mapping; mapping, instead, chooses 1 of the many at random (perhaps it is mapping load order), which can be just as much of a problem for calculation accuracy and can mask the need for grouping and aggregating your mapping table.
Excellent points Peter. I would respond to the "error-prone" claim; mapping guarantees only a single match (which by the way is the first match in load order). It's not necessarily "better" but offers a degree of certainty which is desirable in many situations.
One of the challenges we often face in BI is translating the current available data model into a model accurate for historical analysis. For example, the operational ERP only cares about "what is the price today, when I create this order", whereas the historical analysis needs to know what the price was when the order was cut which may or may not be be recorded in the fact data.
Features such as Join, ApplyMap, or IntervalMatch address different scenarios of data availability, currency and operational practices. I think it's good to know and understand all of these capabilities as it provides the options to create useful analytics with the given available data.
Mapping is indeed just another option with its own advantages and disadvantages, and should not always be used. There are plenty of cases where a join is better.
However, I still claim that Applymap() is less error-prone, and I base that statement solely on my empirical experience: I have seen countless cases where joins have been used to get one value from a different table, and the developer "knows" that it is a one-to-one relationship. But it turns out that it isn't, and as a result QlikView gets an incorrect number of records, with wrong numbers as a consequence.
In a situation where the developer knows that there should only be one value, Applymap() becomes the best choice, since you then use this à priori knowledge.