Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Announcements
QlikView Fans! We’d love to hear from you.
Share your QlikView feedback with the product team… Click here to participate in our 5-minute survey.
Rules, plus terms and conditions, can be found here.
wigchert
New Contributor II

Fuzzy join possible?

Have the following issue:

One table with transaction data [TRANSACTIONS]. In this table there is a (free format) field that contains the description of the transaction.

Have created a mapping table with keywords and categories [MAPPINGS].

Now I like to use the mapping table in such a way that it searches for the keywords in the description field of the TRANSACTION table. It can then take the categories of the first matching/found keyword.

Are there possibilities in QlikView to do such a fuzzy join or fuzzy mapping?

Table TRANSACTIONS (simplified view):

  • DATE
  • VALUE
  • DESCRIPTION

Table MAPPINGS:

  • KEYWORD
  • CATEGORY
Tags (2)
1 Solution

Accepted Solutions
Luminary
Luminary

Fuzzy join possible?

Download Rob Wunderlich's cookbook that has an example of doing a fuzzy mapping or mapping with wildcards.

http://robwunderlich.com/Download.html

Regards.

10 Replies
vgutkovsky
Honored Contributor II

Fuzzy join possible?

Well, it's definitely not possible with mapping since that is a very simple 1-input-1-output method. As far as I know there is no built-in way to do a fuzzy join either, although you might be able to code this. For example, you might be able to search for a keyword using the index() or substringcount() functions. However, I imagine this would only work if you don't have too many unique values in the Keyword field. Essentially, the solution I had in mind would look like this:


JOIN (TRANSACTIONS) LOAD DISTINCT * RESIDENT MAPPINGS; //large outer join

Transactions_new:
LOAD
DATE,
VALUE,
DESCRIPTION,
CATEGORY
RESIDENT TRANSACTIONS
WHERE substringcount(DESCRIPTION,KEYWORD)>0;

DROP TABLE TRANSACTIONS;


This would create a Cartesian join (a join of every possible combination of both tables) because the fields Description and Keyword start off as disconnected. So, for example, if Transactions has 1M rows and Mappings has 10K rows, the resulting table from the outer join would contain 10B rows which I doubt your server (or your patience) could handle. So the short answer to your question is: maybe, if you don't have too many keywords.

Edit: By the way, if DESCRIPTION is not 100% unique, it would be cheaper to create an intermediate table that contains just DISTINCT DESCRIPTION and perform the outer join from MAPPINGS into that intermediate table. Afterwards, you can either leave the table as is or left join it back into TRANSACTIONS with CATEGORY now included.

Regards,

Luminary
Luminary

Fuzzy join possible?

Download Rob Wunderlich's cookbook that has an example of doing a fuzzy mapping or mapping with wildcards.

http://robwunderlich.com/Download.html

Regards.

vgutkovsky
Honored Contributor II

Fuzzy join possible?

Wow, I humbly defer to Rob on this. Brilliant solution!

wigchert
New Contributor II

Fuzzy join possible?

This indeed seems to be what I was looking for!

Thank you very much for making me aware of these solutions from Rob Wunderlich.

Great to have this forum available when you have the feeling that you got stuck Smile

amien
Valued Contributor

Fuzzy join possible?

where can i find the fuzzy option in the cookbook? can only find the wildcardmatch

also check out this option : mapsubstring

http://community.qlik.com/forums/t/25723.aspx

i had the following issues with the solution in the cookbook, perhaps someone allready found a solution for this? :

* issues when i have 2 hits

* i wanted to put the no hits in a seperate category, like 'others'

vgutkovsky
Honored Contributor II

Fuzzy join possible?

The solution is script\mappingWithWildcards.qvw

You can check for no hits easily in the front end by using a calculated dimension to show Category: if(len(trim(Category))>0,Category,'Others')

Regards,

amien
Valued Contributor

Fuzzy join possible?

that's for wildcards .. not fuzzy matching

i was really looking for a load solution

MVP
MVP

Fuzzy join possible?


Amien wrote:
that's for wildcards .. not fuzzy matching
i was really looking for a load solution<div></div>


Wigchert asked for "searches for the keywords", which the wildcard mapping can solve. It IS a load solution. The "no hits" check that Vlad is talking about can easily be added to the load if you want that additional piece.

But you appear to mean something different by "fuzzy matching". What do YOU mean by it? Something like if one table has "Amien" and another has "amian", it'll match them anyway, because they're close? So you'd need some sort of "closeness function", and then map to the row with the highest value for the "closeness function"? Something like that? That would be challenging.

Fuzzy join possible?


Amien wrote:
i had the following issues with the solution in the cookbook, perhaps someone allready found a solution for this? :
* issues when i have 2 hits
* i wanted to put the no hits in a seperate category, like 'others'


In the cookbook example, the last entry in the Wildmatch table is:
*, Other
* matches everything. This is intended to catch the "no-hits" and assign them to "Other".

What's the issue re 2 hits? Do you want to generate another row for the second (or more) hits?

-Rob

Community Browser