<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fuzzy matching fields/columns in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363796#M127626</link>
    <description>Hi Sabrina, 
&lt;BR /&gt;Sure....Essentially I want to check Address 1 and 2 first to see if anything matches there. If it does it should give me a score. Say in this case the score would be 20 for matching 187 in both columns. 
&lt;BR /&gt;Then I want to check Address 2 and 3 and this would return a score. In this case the score would be much higher as the match is greater and would probably return around 80-90 for matching Tom Street in both columns. 
&lt;BR /&gt;I don't know if there is a way to compare all 3 address columns at the same time. If there is then I expect this would work differently as you would get an overall score instead of individual scores. However I would prefer this to work as per my example above. This will create quite a few exceptions but that should be fine as that is exactly what we want so that we can make sure the data is then cleaned and presented correctly. Here is an example of what I would expect to see in the exceptions output: 
&lt;BR /&gt;Address 1,Address 2,Score 
&lt;BR /&gt;Flat 187,187 Tom Street,20 
&lt;BR /&gt;Address 2,Address 3,Score 
&lt;BR /&gt;187 Tom Street,Tom Street,85 
&lt;BR /&gt;Thanks 
&lt;BR /&gt;Jay</description>
    <pubDate>Thu, 20 Oct 2016 11:05:49 GMT</pubDate>
    <dc:creator>jay6</dc:creator>
    <dc:date>2016-10-20T11:05:49Z</dc:date>
    <item>
      <title>Fuzzy matching fields/columns</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363794#M127624</link>
      <description>&lt;P&gt;Hello.&lt;BR /&gt;I'm new to Talend and have a question about fuzzy matching. I've searched through the forum but was unable to find what I was hoping to find. This is what I'm trying to do:&lt;BR /&gt;I have a record which looks like this&lt;BR /&gt;Address 1,Address 2,Address 3&lt;BR /&gt;Flat 187,187 Tom Street,Tom Street&lt;BR /&gt;&lt;BR /&gt;As you can see some of the information is duplicated between the address fields and I want to 'fuzzy' match the data so I can get some kind of score to indicate what percentage of the data in the field matches another field. e.g. 187 from Address 1 matches 187 from Address 2. Based on this I should get some kind of matching score. I need to do this at record and field level and not some lookup.&lt;BR /&gt;I've tried a few of the components that are provided by Talend Studio 6.2 but none of these do what I'm trying to achieve and neither can I find any other external components that can do this. Everything I have searched for has brought up 2 components (tFuzzyMatch and tRecordMatching) which I have tried but do not provide the results I'm looking for. Hopefully someone here has some solution or direction on how to achieve this as I find it hard to believe that some sort of component does not exist for this as this is part of the basic data cleansing process to ensure you have 99% accurate data in the output files.&lt;BR /&gt;Thanks for you help in advance.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 14:24:12 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363794#M127624</guid>
      <dc:creator>jay6</dc:creator>
      <dc:date>2016-10-19T14:24:12Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy matching fields/columns</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363795#M127625</link>
      <description>Hi,&lt;BR /&gt;Could you also give us your expected result, please?&lt;BR /&gt;Best regards&lt;BR /&gt;Sabrina</description>
      <pubDate>Thu, 20 Oct 2016 03:36:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363795#M127625</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-10-20T03:36:24Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy matching fields/columns</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363796#M127626</link>
      <description>Hi Sabrina, 
&lt;BR /&gt;Sure....Essentially I want to check Address 1 and 2 first to see if anything matches there. If it does it should give me a score. Say in this case the score would be 20 for matching 187 in both columns. 
&lt;BR /&gt;Then I want to check Address 2 and 3 and this would return a score. In this case the score would be much higher as the match is greater and would probably return around 80-90 for matching Tom Street in both columns. 
&lt;BR /&gt;I don't know if there is a way to compare all 3 address columns at the same time. If there is then I expect this would work differently as you would get an overall score instead of individual scores. However I would prefer this to work as per my example above. This will create quite a few exceptions but that should be fine as that is exactly what we want so that we can make sure the data is then cleaned and presented correctly. Here is an example of what I would expect to see in the exceptions output: 
&lt;BR /&gt;Address 1,Address 2,Score 
&lt;BR /&gt;Flat 187,187 Tom Street,20 
&lt;BR /&gt;Address 2,Address 3,Score 
&lt;BR /&gt;187 Tom Street,Tom Street,85 
&lt;BR /&gt;Thanks 
&lt;BR /&gt;Jay</description>
      <pubDate>Thu, 20 Oct 2016 11:05:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363796#M127626</guid>
      <dc:creator>jay6</dc:creator>
      <dc:date>2016-10-20T11:05:49Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy matching fields/columns</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363797#M127627</link>
      <description>Hi,&lt;BR /&gt;So it seems the Talend Team may not have a solution for this, since there has been no response. Does anyone else have any suggestions for me please?&lt;BR /&gt;thanks&lt;BR /&gt;Jay</description>
      <pubDate>Mon, 31 Oct 2016 12:06:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-matching-fields-columns/m-p/2363797#M127627</guid>
      <dc:creator>jay6</dc:creator>
      <dc:date>2016-10-31T12:06:22Z</dc:date>
    </item>
  </channel>
</rss>

