Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello.
I'm new to Talend and have a question about fuzzy matching. I've searched through the forum but was unable to find what I was hoping to find. This is what I'm trying to do:
I have a record which looks like this
Address 1,Address 2,Address 3
Flat 187,187 Tom Street,Tom Street
As you can see some of the information is duplicated between the address fields and I want to 'fuzzy' match the data so I can get some kind of score to indicate what percentage of the data in the field matches another field. e.g. 187 from Address 1 matches 187 from Address 2. Based on this I should get some kind of matching score. I need to do this at record and field level and not some lookup.
I've tried a few of the components that are provided by Talend Studio 6.2 but none of these do what I'm trying to achieve and neither can I find any other external components that can do this. Everything I have searched for has brought up 2 components (tFuzzyMatch and tRecordMatching) which I have tried but do not provide the results I'm looking for. Hopefully someone here has some solution or direction on how to achieve this as I find it hard to believe that some sort of component does not exist for this as this is part of the basic data cleansing process to ensure you have 99% accurate data in the output files.
Thanks for you help in advance.