Sort of Fuzzy Matching

kdv — Thu, 04 Oct 2018 19:19:35 GMT

Warning, newbie question!

I have two files where I am trying to merge the data based on a specific field (inner join). File A has a reasonably clean reference field and is easily parsed/used. File B on the other hand is an amalgamation of data that comes from a variety of different sources and therefore the reference field comes in all sorts of shapes and sizes. I want to be able to still match them though. Here is a practical, fictitious example of a reference in the two files:

File A: "Joe Bloggs"

File B: Fund Transfer : JoeBloggsACME-883366133256 : JOE BLOGGS BLOGGS Debit Account: 12196895 Credit Account: 12856966

Here is another example (from the same two files as the above example) to help show how different it can be, even within the same files:

File A: 432046055941

File B: "REF 432046055941"

Clearly doing an inner join won't work. However as you can see, there is enough common text between the two fields in the respective files that I should be able to match. It is just that it is not consistent so impossible to build a string manipulation formula. I have dabbled with using the tFuzzyMatch component. But I didn't get great results and I suspect that is too "high brow" for my problem.

Is there another component/setting anybody can suggest I use or point me in the right direction please?

Thanks

Re: Sort of Fuzzy Matching

vapukov — Fri, 05 Oct 2018 06:56:48 GMT

if it always (!!!) as described in examples - you just need

StringHandling.INDEX("hello world!","hello") != -1

if reality more complicated - need think more

topic Sort of Fuzzy Matching in Talend Studio

Sort of Fuzzy Matching

Re: Sort of Fuzzy Matching