Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
wigchert
Partner - Contributor II
Partner - Contributor II

Fuzzy join possible?

Have the following issue:

One table with transaction data [TRANSACTIONS]. In this table there is a (free format) field that contains the description of the transaction.

Have created a mapping table with keywords and categories [MAPPINGS].

Now I like to use the mapping table in such a way that it searches for the keywords in the description field of the TRANSACTION table. It can then take the categories of the first matching/found keyword.

Are there possibilities in QlikView to do such a fuzzy join or fuzzy mapping?

Table TRANSACTIONS (simplified view):

  • DATE
  • VALUE
  • DESCRIPTION

Table MAPPINGS:

  • KEYWORD
  • CATEGORY
1 Solution

Accepted Solutions
pover
Luminary Alumni
Luminary Alumni

Download Rob Wunderlich's cookbook that has an example of doing a fuzzy mapping or mapping with wildcards.

http://robwunderlich.com/Download.html

Regards.

View solution in original post

10 Replies
vgutkovsky
Master II
Master II

Well, it's definitely not possible with mapping since that is a very simple 1-input-1-output method. As far as I know there is no built-in way to do a fuzzy join either, although you might be able to code this. For example, you might be able to search for a keyword using the index() or substringcount() functions. However, I imagine this would only work if you don't have too many unique values in the Keyword field. Essentially, the solution I had in mind would look like this:


JOIN (TRANSACTIONS) LOAD DISTINCT * RESIDENT MAPPINGS; //large outer join

Transactions_new:
LOAD
DATE,
VALUE,
DESCRIPTION,
CATEGORY
RESIDENT TRANSACTIONS
WHERE substringcount(DESCRIPTION,KEYWORD)>0;

DROP TABLE TRANSACTIONS;


This would create a Cartesian join (a join of every possible combination of both tables) because the fields Description and Keyword start off as disconnected. So, for example, if Transactions has 1M rows and Mappings has 10K rows, the resulting table from the outer join would contain 10B rows which I doubt your server (or your patience) could handle. So the short answer to your question is: maybe, if you don't have too many keywords.

Edit: By the way, if DESCRIPTION is not 100% unique, it would be cheaper to create an intermediate table that contains just DISTINCT DESCRIPTION and perform the outer join from MAPPINGS into that intermediate table. Afterwards, you can either leave the table as is or left join it back into TRANSACTIONS with CATEGORY now included.

Regards,

pover
Luminary Alumni
Luminary Alumni

Download Rob Wunderlich's cookbook that has an example of doing a fuzzy mapping or mapping with wildcards.

http://robwunderlich.com/Download.html

Regards.

vgutkovsky
Master II
Master II

Wow, I humbly defer to Rob on this. Brilliant solution!

wigchert
Partner - Contributor II
Partner - Contributor II
Author

This indeed seems to be what I was looking for!

Thank you very much for making me aware of these solutions from Rob Wunderlich.

Great to have this forum available when you have the feeling that you got stuck Smile

amien
Specialist
Specialist

where can i find the fuzzy option in the cookbook? can only find the wildcardmatch

also check out this option : mapsubstring

http://community.qlik.com/forums/t/25723.aspx

i had the following issues with the solution in the cookbook, perhaps someone allready found a solution for this? :

* issues when i have 2 hits

* i wanted to put the no hits in a seperate category, like 'others'

vgutkovsky
Master II
Master II

The solution is script\mappingWithWildcards.qvw

You can check for no hits easily in the front end by using a calculated dimension to show Category: if(len(trim(Category))>0,Category,'Others')

Regards,

amien
Specialist
Specialist

that's for wildcards .. not fuzzy matching

i was really looking for a load solution

johnw
Champion III
Champion III


Amien wrote:
that's for wildcards .. not fuzzy matching
i was really looking for a load solution<div></div>


Wigchert asked for "searches for the keywords", which the wildcard mapping can solve. It IS a load solution. The "no hits" check that Vlad is talking about can easily be added to the load if you want that additional piece.

But you appear to mean something different by "fuzzy matching". What do YOU mean by it? Something like if one table has "Amien" and another has "amian", it'll match them anyway, because they're close? So you'd need some sort of "closeness function", and then map to the row with the highest value for the "closeness function"? Something like that? That would be challenging.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP


Amien wrote:
i had the following issues with the solution in the cookbook, perhaps someone allready found a solution for this? :
* issues when i have 2 hits
* i wanted to put the no hits in a seperate category, like 'others'


In the cookbook example, the last entry in the Wildmatch table is:
*, Other
* matches everything. This is intended to catch the "no-hits" and assign them to "Other".

What's the issue re 2 hits? Do you want to generate another row for the second (or more) hits?

-Rob