Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
jay6
Contributor III
Contributor III

Fuzzy matching fields/columns

Hello.
I'm new to Talend and have a question about fuzzy matching. I've searched through the forum but was unable to find what I was hoping to find. This is what I'm trying to do:
I have a record which looks like this
Address 1,Address 2,Address 3
Flat 187,187 Tom Street,Tom Street

As you can see some of the information is duplicated between the address fields and I want to 'fuzzy' match the data so I can get some kind of score to indicate what percentage of the data in the field matches another field. e.g. 187 from Address 1 matches 187 from Address 2. Based on this I should get some kind of matching score. I need to do this at record and field level and not some lookup.
I've tried a few of the components that are provided by Talend Studio 6.2 but none of these do what I'm trying to achieve and neither can I find any other external components that can do this. Everything I have searched for has brought up 2 components (tFuzzyMatch and tRecordMatching) which I have tried but do not provide the results I'm looking for. Hopefully someone here has some solution or direction on how to achieve this as I find it hard to believe that some sort of component does not exist for this as this is part of the basic data cleansing process to ensure you have 99% accurate data in the output files.
Thanks for you help in advance.

Labels (2)
3 Replies
Anonymous
Not applicable

Hi,
Could you also give us your expected result, please?
Best regards
Sabrina
jay6
Contributor III
Contributor III
Author

Hi Sabrina,
Sure....Essentially I want to check Address 1 and 2 first to see if anything matches there. If it does it should give me a score. Say in this case the score would be 20 for matching 187 in both columns.
Then I want to check Address 2 and 3 and this would return a score. In this case the score would be much higher as the match is greater and would probably return around 80-90 for matching Tom Street in both columns.
I don't know if there is a way to compare all 3 address columns at the same time. If there is then I expect this would work differently as you would get an overall score instead of individual scores. However I would prefer this to work as per my example above. This will create quite a few exceptions but that should be fine as that is exactly what we want so that we can make sure the data is then cleaned and presented correctly. Here is an example of what I would expect to see in the exceptions output:
Address 1,Address 2,Score
Flat 187,187 Tom Street,20
Address 2,Address 3,Score
187 Tom Street,Tom Street,85
Thanks
Jay
jay6
Contributor III
Contributor III
Author

Hi,
So it seems the Talend Team may not have a solution for this, since there has been no response. Does anyone else have any suggestions for me please?
thanks
Jay