topic Re: TFuzzyMatch using Levenshtein Method in Talend Data Catalog

TFuzzyMatch using Levenshtein Method

Anonymous — Tue, 18 Mar 2014 14:37:08 GMT

Hi,
I wanted to understand the matching logic in scenario of multiple key attributes using Levenshtein Method with min and max distance as 0 and 5 respectively. What I want to know is : the records are categorized as duplicate on meeting even a single criteria or all the criteria.

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Tue, 18 Mar 2014 15:30:01 GMT

sorry reply to wrong thread

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Tue, 18 Mar 2014 18:06:47 GMT

Mr.M,
If you build a compound key of multiple columns then all of them are taken into account for the match, not just individually.
I would also like to solicit more of an understanding of your data, use case and ultimate goal as to better serve your question. There are several matching components. Which one are you using as a screencap of the job with the component settings would be very useful for our progress?

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Wed, 19 Mar 2014 09:16:09 GMT

Hi,
We are trying to identify duplicated customers based on First Name, Last Name, Phone Number, Email, Address, Zip Code. On Phone Number and ZIP I have applied exact match and on others Levenshtein method.

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Wed, 19 Mar 2014 10:12:40 GMT

Also, I want to understand how does the tFuzzyMatch logic treat the missing values.

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Thu, 20 Mar 2014 13:42:21 GMT

Hi,
In continuation, I also want to understand if Talend fuzzymatch supports the below feature or not.
Let us say, I want to perform match on Name, Address, Email, Phone Number:-
1. What if, for some records the fields are empty. I mean the fill rate is less than 100%. In such scenario, how does Talend handles matching.
2. Can we specify multiple rules in one go like on (Name, Address, Email, Phone Number) or (Name, Email, Phone Number) or (Name, Email) or (Name, Phone Number). In the sense, if any of these 4 rules satisfy, talend should return the records as duplicate records.

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Mon, 28 Mar 2016 11:24:16 GMT

Hi,
I am using talend open studio version 6.1 .Is it possible to perform in-line matching using tfuzzy match component.I want to match on more than one column like on firstname,lastname,address,zip and phone number.Also is it possible to get different outputs for duplicate and unique values using this component.

Re: TFuzzyMatch using Levenshtein Method

Anonymous — Tue, 29 Mar 2016 09:30:50 GMT

Hi,

I am using talend open studio version 6.1 .Is it possible to perform in-line matching using tfuzzy match component.I want to match on more than one column like on firstname,lastname,address,zip and phone number.Also is it possible to get different outputs for duplicate and unique values using this component.

For your in-line operation, could you please elaborate your case with an example with input and expected output values?

Here is a component TalendHelpCenter:tRecordMatching which joins two tables by doing a fuzzy match on several columns using a wide variety of comparison algorithms.(define serveral keys)
Note: This component will be available in the Palette of Talend Studio on the condition that you have subscribed to one of the Talend Platform products.
Best regards
Sabrina