Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

Employee
Employee

Don't join - use Applymap instead

My latest blog post was on joins in QlikView (see To Join or not to Join). In it I claimed that you should avoid making joins in the QlikView script, if possible. This blog post is about a function that can help you avoid joins. It is about the function Applymap().

It is, in fact, one of my favorite functions and I do not seem to be alone in this choice. Several of the QlikView developers who regularly write on QlikCommunity seem to share this preference and have written about this function, e.g., QlikView Notes, Quick Intelligence and QVApps.

 

So, what does the function do? Well, basically it is just a lookup function – it takes one value as input, checks if this value exists in a mapping table and returns the corresponding value from the second column in the mapping table. Think of it as a translation function where the translation is fetched from a pre-defined, two-column table. A simple translation function, but you can still do a lot with it...

 

What has this got to do with joins? Everything. A very common situation in data modeling is that you need to get a single field value (per record) from a different table than the current one. Then the standard way to do this is to use a join. However, in QlikView you can – and should – use Applymap() instead. The only time that you cannot use Applymap() instead of a join is if you want to fetch more than one single corresponding value (record) from the second table.

 

Let's look at an example of how to use Applymap(): In an ERP system, the customer table is the table where all information about the customer is stored, e.g. the name and country of the customer. Further, it could be that you need the customer’s country when making a calculation in the order table. This is a field that you don't have in the order table. In SQL you would join the two tables to get country into the order table.

 

In the QlikView script you would instead first define the mapping table that maps a customer ID to a customer country as follows:

 

     MapCustomerIDtoCountry:
     Mapping Load CustomerID, Country From Customers ;

 

The second step is to use this information when reading the order table, such as:

 

     Orders:
     Load *,

          ApplyMap('MapCustomerIDtoCountry', CustomerID, null()) as Country
          From Orders ;

Tables2.png

 

The "null()" as third parameter of the function defines what it should return when the customer ID isn't found in the mapping table.

 

And with this, you have joined the field Country with the order table – without using a join. And you have done it faster and with less chance of errors. Bottom line: Whenever you know that you just want to fetch one single value per record – don’t join. Use Applymap() instead.

 

HIC

 

See more in the Technical Brief on Joins and Lookups.

81 Comments

I'm glad to read this from a more experienced and trustworthy voice!

2,020 Views
Not applicable

The real question could be for a case where you would to have to retreive 5 fields from another table.

In terms of performance what would you prefer ?

A single join or 5 applymap?

Regards,

DA

2,020 Views
Employee
Employee

If one of the tables is large I would definitely prefer five Applymaps.

Also, there is always a risk with joining: Even if you think you have a one-to-one relationship between the tables so that no records will be duplicated in the join, you often in fact have a one-to-many situation so that the number of records increases. Which leads to incorrect calculations. I have seen this happen many times in real life. This is a risk you do not have with Applymap.

HIC 

2,020 Views
MVP
MVP

Hi Henric - many thanks for the mention and the link.  Happy to read that my thoughts on the ApplyMap function are shared.

0 Likes
2,020 Views
Not applicable

Hi Henric/Steven - I am new to QV, but find ApplyMap and your posts very helpful. Thank you.

0 Likes
2,020 Views
robert99
Valued Contributor II

Thanks. This is perfect for me

I had three issues at work. I had found solution but will change to mapping tables

One is tables with records included twice (the same serial number is sometimes included two or three times)  in it. So I found other ways to only select the first serial. But this was a little complex. Mapping is perfect

And customer groups. I wanted the top 20 customers (made up of around 200* customer numbers) grouped by cust combined name  and the rest as other Customers. Mapping is also perfect as it allows the default option (Other or whatever) if cust num is not found

Another situation. I want to replace if found in a mapping table (replacement cust ref as the cust changed their numbers to a different format so we had the old number and new number  ) but keep if it does not appear in this table

So thanks for this blog post (and the other one). Mapping tables is ideal

0 Likes
2,020 Views
Employee
Employee

Glad you like it!

However, since there often is more than one way to do things in QlikView, I will point at alternative solutions for the two first problems. Then it is up to you to choose which you prefer...

  • Duplicates:
    The following load will only load the first of duplicate serial numbers:
    Load SerialNumber, ... where not Exists(SerialNumber)
  • Top 20 and others:
    Use "Dimension limits" (only in QV11) in the chart properties to define your top 20 customers. Then you get a dynamic evaluation (evaluates after the selection) as opposed to doing it in the script which is a static evaluation.

HIC

0 Likes
2,020 Views
robert99
Valued Contributor II

Thanks for these recommendations

I only have QV10 at the moment but will try your solution when we upgrade. I will also look at where non exists. Re customers they have to be grouped anyway as one top 20 customer might have a lot more than 1 cust number in the main system and there is no way to do the linking in a main system. So I group the top 20 only and include the rest in other.

Of course I should have said will change to mapping tables in a copy to see how it goes. At present my set up works well (but is not perfect and carries the duplication risk) but mapping seems a much better option (and would have initially tried this if I knew what mapping tables did. I read the manual and could not work it out) and will swap across if it is an improvement after testing.

0 Likes
2,020 Views
Employee
Employee

If one customer can have more than one customer ID, then Applymap is without any doubt the best way to "consolidate" them into one ID. So, good luck!

HIC

0 Likes
2,020 Views
Not applicable

Is there a table size at which point its better to join?  I have a table with 21 million rows.  right now I'm doing 5 mapping tables.  Would it be better to join all these together, or does the size of the table not matter?

wrj

0 Likes
2,020 Views
Employee
Employee

I might use the Join prefix for small tables, but not for big since it uses a lot of memory. So, the bigger table, the more reason to avoid the Join prefix.

In your case, I would ask myself whether there is a risk with joining (record duplication), and if not, if I can do it on the database (using a JOIN inside the SELECT). Because that could be an option.

But if you cannot do it on the database, I would stick to 5 mapping tables.

Finally, we should always ask ourselves: What am I trying to optimize? Here the answer is probably script execution time. Next question is whether script execution time is a problem. If it isn't and I have a functioning solution, well then  ... I personally often find myself optimizing things that don't need optimization. Because it is a fun challenge...

HIC

2,020 Views
Not applicable

gr8 gr8 gr8!!!!

0 Likes
2,020 Views
Not applicable

Henric,

THIS

MADE

MY

DAY (or week)

I was crashing my head to have an intricate join yield unique values, where duplicates where coming up and expected due to the nature of the tables...

This solution is absolutely simple, lean, fast, harmless and PERFECT!

just one question, of which I'm 99% sure of the answer.

When the Mapping table has

CustomerID     Country

2                    Germany

2                    France

The function is picking up Germany I suppose, i.e. the first correspondence found for the mapped key, right?

Again thank you!!

0 Likes
2,020 Views
Employee
Employee

Glad I could help you.

And, yes, you are right about duplicates in the mapping table. For the sake of simplicity, it picks up the first occurance.

HIC

Read more on my other blog posts.

2,020 Views
kalyandg
Contributor II

Dear Henric,

Thank you for posting such a useful info, am doing project, in which i have linked five tables with one table.

for example, the script as follows in my project

Master:

Item,

Product,

ProductGroup,

ItemLocation

Size

ProductName,

Level1:

Item,

L1Code

Level2:

Product,

L2Code

Level3:

Product Group,

L3Code

Level4:

ItemLocation,

L4Code

Level5:

Size,

L5Code

Level6:

ProductName,

L6Code

these all links each with corresponding fields in master table

but for optimizing application, i have to use applymap and mapping load

please help me to solve this, and provide me a script code for each level and how to map as in my script.

0 Likes
2,020 Views
Employee
Employee

If you want to include the L-codes into the master table, then it is straightforward. First create a mapping table:

          'L1Map':

          Mapping Load Item, L1Code From Level1;

and then use this information in the Master table load statement.

          ApplyMap('L1Map', Item) as L1Code

The same should be done for all 6 levels, i.e six different ApplyMap() calls, each creating a new field.

If you have more questions on your specific problem, I think it would be better to enter it as a question on our discusion forum.

HIC

0 Likes
2,020 Views
kalyandg
Contributor II

Dear Henric,

Thanks for your reply i have one doubt wheather we have to  load Master table first or Level table and also we have more than two fields in level 2 table what we have to do? 

0 Likes
2,020 Views
Employee
Employee

You have to do the mapping loads of all levels before you load the master table with Applymap.

The mapping load can only have two fields, but the table you load from can of course have more. Just make a load statement with the two correct fields and it will work.

HIC

0 Likes
2,020 Views
ebg
New Contributor III

Hi Henric,

Whats your take on using 5 Lookup() against the one table instead of creating 5 mapping loads and 5 ApplyMaps()?

Is there performance to be won in using ApplyMap() instead of Lookup()?

Regards

Jesper

0 Likes
2,020 Views
Employee
Employee

The performance of applymap is much better, so I would use 5 mapping tables and 5 applymap calls.

HIC

0 Likes
2,020 Views
kalyandg
Contributor II

hi Henric,

Now it is working, thanks

0 Likes
2,020 Views
jmm
New Contributor

I absolutely love applymap()! It's great to have this confirmed by you Henric!

I just wanted to emphasize the importance of the third parameter, where you can "fill-in" the nulls with a default value (i.e. "MISSING") so users can click on it for further investigation. Filling nulls usually takes two LOAD statements if you are using joins. Applymap rocks!

Juan

0 Likes
2,020 Views
Not applicable

Hi Henric,

Thanks a lot for this great tip.

I do have a question still, regarding the possibility of using ApplyMap instead of Join. You state:

"The only time that you cannot use Applymap() instead of a join is if you want to fetch more than one single corresponding value (record) from the second table."

My question is, if the Key for mapping (or join) is over two different fields, mapping is not possible. Am I right?

Ex.: Imagine that to determine the Country you would need to use not only the CustomerID but also the Region (that also existed in the Orders table)

Thanks in advance for your help.

Either way this is very usefull.

Rui

0 Likes
2,020 Views
MVP
MVP

Hi Rui,

You simply need to create a composite key by concatenating multiple fields.  See my blog post for details on how to acheive this: http://bit.ly/kQcAZ5

Steve

http://www.quickintelligence.co.uk/

0 Likes
2,020 Views
Not applicable

Simple, easy and functional.

Just perfect!

Thank you all.

Rui

0 Likes
2,020 Views
Employee
Employee

Steve is right. Using a composite key, you can do almost anything. My comment was about if you want two or more values from the same field. Then you need to join.

HIC

0 Likes
2,020 Views
montubhardwaj
Valued Contributor

Thanks for the wonderful post.

0 Likes
2,020 Views
wonoh0817
New Contributor III

I was told to use 'ApplyMap' if data set is lower than 10 million rows by a QlikView instructor. If it's more than 10 million rows, he said that 'Join' would be performing better. Any thoughts?

0 Likes
2,020 Views
MVP
MVP

This seems odd to me.  I would have thought that the more rows you have all the more reason to avoid doing JOINs.

Steve

http://www.quickintelligence.co.uk/

0 Likes
2,020 Views
Employee
Employee

I agree with Steve. I would not use a Join it the table is large.

HIC

0 Likes
2,020 Views