<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sort of Fuzzy Matching in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Sort-of-Fuzzy-Matching/m-p/2242619#M29377</link>
    <description>&lt;P&gt;Warning, newbie question!&lt;/P&gt; 
&lt;P&gt;I have two files where I am trying to merge the data based on a specific field (inner join).&amp;nbsp; File A has a reasonably clean reference field and is easily parsed/used.&amp;nbsp; File B&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;on the other hand&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is an amalgamation of data that comes from a variety of different sources and therefore the reference field comes in all sorts of shapes and sizes.&amp;nbsp; I want to be able to still match them though.&amp;nbsp; Here is a practical, fictitious example of a reference in the two files:&lt;/P&gt; 
&lt;P&gt;File A: "Joe Bloggs"&lt;/P&gt; 
&lt;P&gt;File B:&amp;nbsp;Fund&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Transfer :&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;JoeBloggsACME-883366133256 :&amp;nbsp;JOE BLOGGS BLOGGS Debit Account: 12196895 Credit Account:&amp;nbsp;12856966&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Here is another example (from the same two files as the above example) to help show how different it can be, even within the same files:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;File A: 432046055941&lt;/P&gt; 
&lt;P&gt;File B: "REF 432046055941"&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Clearly doing an inner join won't work.&amp;nbsp; However as you can see, there is enough common text between the two fields in the respective files that I should be able to match.&amp;nbsp; It is just that it is not consistent so impossible to build a string manipulation formula.&amp;nbsp; I have dabbled with using the&amp;nbsp;tFuzzyMatch&amp;nbsp;component.&amp;nbsp; But I didn't get great results and I suspect that is too "high brow" for my problem.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Is there another component/setting anybody can suggest I use or point me in the right direction&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;please?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Thu, 04 Oct 2018 19:19:35 GMT</pubDate>
    <dc:creator>kdv</dc:creator>
    <dc:date>2018-10-04T19:19:35Z</dc:date>
    <item>
      <title>Sort of Fuzzy Matching</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sort-of-Fuzzy-Matching/m-p/2242619#M29377</link>
      <description>&lt;P&gt;Warning, newbie question!&lt;/P&gt; 
&lt;P&gt;I have two files where I am trying to merge the data based on a specific field (inner join).&amp;nbsp; File A has a reasonably clean reference field and is easily parsed/used.&amp;nbsp; File B&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;on the other hand&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is an amalgamation of data that comes from a variety of different sources and therefore the reference field comes in all sorts of shapes and sizes.&amp;nbsp; I want to be able to still match them though.&amp;nbsp; Here is a practical, fictitious example of a reference in the two files:&lt;/P&gt; 
&lt;P&gt;File A: "Joe Bloggs"&lt;/P&gt; 
&lt;P&gt;File B:&amp;nbsp;Fund&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Transfer :&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;JoeBloggsACME-883366133256 :&amp;nbsp;JOE BLOGGS BLOGGS Debit Account: 12196895 Credit Account:&amp;nbsp;12856966&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Here is another example (from the same two files as the above example) to help show how different it can be, even within the same files:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;File A: 432046055941&lt;/P&gt; 
&lt;P&gt;File B: "REF 432046055941"&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Clearly doing an inner join won't work.&amp;nbsp; However as you can see, there is enough common text between the two fields in the respective files that I should be able to match.&amp;nbsp; It is just that it is not consistent so impossible to build a string manipulation formula.&amp;nbsp; I have dabbled with using the&amp;nbsp;tFuzzyMatch&amp;nbsp;component.&amp;nbsp; But I didn't get great results and I suspect that is too "high brow" for my problem.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Is there another component/setting anybody can suggest I use or point me in the right direction&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;please?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 19:19:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sort-of-Fuzzy-Matching/m-p/2242619#M29377</guid>
      <dc:creator>kdv</dc:creator>
      <dc:date>2018-10-04T19:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: Sort of Fuzzy Matching</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Sort-of-Fuzzy-Matching/m-p/2242620#M29378</link>
      <description>&lt;P&gt;if it always (!!!) as described in examples - you just need&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;StringHandling.INDEX(&lt;SPAN&gt;"hello world!"&lt;/SPAN&gt;,&lt;SPAN&gt;"hello"&lt;/SPAN&gt;) != -1&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;if reality more complicated - need think more&lt;/P&gt;</description>
      <pubDate>Fri, 05 Oct 2018 06:56:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Sort-of-Fuzzy-Matching/m-p/2242620#M29378</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2018-10-05T06:56:48Z</dc:date>
    </item>
  </channel>
</rss>

