<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Fuzzy match &amp; dynamic join in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357001#M122402</link>
    <description>Hello,&amp;nbsp; 
&lt;BR /&gt;I need help with my project. In my project I have 2 tables:&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;B&gt;CUSTOMERS &lt;/B&gt;table (id, name, surname, street, street number, zip code, phone, email) about 
&lt;B&gt;100 000 records&lt;/B&gt;.&amp;nbsp; 
&lt;BR /&gt; 
&lt;B&gt;USERS &lt;/B&gt;table (id, name + surname, street, street number, postal code) about 
&lt;B&gt;1 million records&lt;/B&gt;.&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;I need to 
&lt;B&gt;pair&lt;/B&gt; with the&amp;nbsp; 
&lt;B&gt;CUSTOMERS&amp;nbsp;&lt;/B&gt;table in the&amp;nbsp; 
&lt;B&gt;USERS&amp;nbsp;&lt;/B&gt;table.&amp;nbsp; 
&lt;BR /&gt;When a given 
&lt;B&gt;customer &lt;/B&gt;found in the table of&amp;nbsp; 
&lt;B&gt;USERS&amp;nbsp;&lt;/B&gt;, I set him a particular attribute. P 
&lt;U&gt;roblem is, even the street and names are not completely identical in&amp;nbsp;both tables.&amp;nbsp;&lt;/U&gt; 
&lt;BR /&gt;Within the pairing needs to do a 
&lt;B&gt;fuzzy match&lt;/B&gt;, follow these steps:&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;postal code must match&amp;nbsp; 
&lt;BR /&gt;street must have a levenshtein distance&amp;nbsp;value max 3&amp;nbsp; 
&lt;BR /&gt;The name must&amp;nbsp;have a levenshtein distance&amp;nbsp;value&amp;nbsp;max 5&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;B&gt;It is possible to do Fuzzy match on multiple columns?&lt;/B&gt; as adding a condition when equal psc &amp;amp;&amp;amp; fuzzy match on the street is max 3 and then come on fuzzy match on the name of the&amp;nbsp;CUSTOMERS&amp;nbsp;?&amp;nbsp; 
&lt;BR /&gt;Respectively, I tried to make the columns in both tables as the 
&lt;U&gt;concatenation &lt;/U&gt;psc + street + street number + surname + number. Problem is the 100-000 records seeking one record in a table of size 1 million record. It's awfully slow, it would take about 5 days.&amp;nbsp; 
&lt;BR /&gt;&amp;nbsp; 
&lt;BR /&gt; 
&lt;B&gt;&lt;U&gt;Is there any way to dynamically take all the records from the&amp;nbsp;CUSTOMERS&amp;nbsp;table with the value of a PSC, and to Him do dynamic table inner join&amp;nbsp;USERS&amp;nbsp;with the same psc, and from then on the table to do those little fuzzy join, and iteratively continue like this for all possible&amp;nbsp;values listed psc in the&amp;nbsp;CUSTOMERS&amp;nbsp;table&lt;/U&gt;&lt;/B&gt;?&amp;nbsp; 
&lt;BR /&gt;If you could, can you explain how to do this dynamically, respectively ja stock and set the component? 
&lt;BR /&gt;I use Talend Open Studio for Data Integration and PostgreSQL 9.3 
&lt;BR /&gt;Thanks for your help, I am at a loss.</description>
    <pubDate>Thu, 24 Jul 2014 09:44:14 GMT</pubDate>
    <dc:creator>peterko</dc:creator>
    <dc:date>2014-07-24T09:44:14Z</dc:date>
    <item>
      <title>Fuzzy match &amp; dynamic join</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357001#M122402</link>
      <description>Hello,&amp;nbsp; 
&lt;BR /&gt;I need help with my project. In my project I have 2 tables:&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;B&gt;CUSTOMERS &lt;/B&gt;table (id, name, surname, street, street number, zip code, phone, email) about 
&lt;B&gt;100 000 records&lt;/B&gt;.&amp;nbsp; 
&lt;BR /&gt; 
&lt;B&gt;USERS &lt;/B&gt;table (id, name + surname, street, street number, postal code) about 
&lt;B&gt;1 million records&lt;/B&gt;.&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;I need to 
&lt;B&gt;pair&lt;/B&gt; with the&amp;nbsp; 
&lt;B&gt;CUSTOMERS&amp;nbsp;&lt;/B&gt;table in the&amp;nbsp; 
&lt;B&gt;USERS&amp;nbsp;&lt;/B&gt;table.&amp;nbsp; 
&lt;BR /&gt;When a given 
&lt;B&gt;customer &lt;/B&gt;found in the table of&amp;nbsp; 
&lt;B&gt;USERS&amp;nbsp;&lt;/B&gt;, I set him a particular attribute. P 
&lt;U&gt;roblem is, even the street and names are not completely identical in&amp;nbsp;both tables.&amp;nbsp;&lt;/U&gt; 
&lt;BR /&gt;Within the pairing needs to do a 
&lt;B&gt;fuzzy match&lt;/B&gt;, follow these steps:&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;postal code must match&amp;nbsp; 
&lt;BR /&gt;street must have a levenshtein distance&amp;nbsp;value max 3&amp;nbsp; 
&lt;BR /&gt;The name must&amp;nbsp;have a levenshtein distance&amp;nbsp;value&amp;nbsp;max 5&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;B&gt;It is possible to do Fuzzy match on multiple columns?&lt;/B&gt; as adding a condition when equal psc &amp;amp;&amp;amp; fuzzy match on the street is max 3 and then come on fuzzy match on the name of the&amp;nbsp;CUSTOMERS&amp;nbsp;?&amp;nbsp; 
&lt;BR /&gt;Respectively, I tried to make the columns in both tables as the 
&lt;U&gt;concatenation &lt;/U&gt;psc + street + street number + surname + number. Problem is the 100-000 records seeking one record in a table of size 1 million record. It's awfully slow, it would take about 5 days.&amp;nbsp; 
&lt;BR /&gt;&amp;nbsp; 
&lt;BR /&gt; 
&lt;B&gt;&lt;U&gt;Is there any way to dynamically take all the records from the&amp;nbsp;CUSTOMERS&amp;nbsp;table with the value of a PSC, and to Him do dynamic table inner join&amp;nbsp;USERS&amp;nbsp;with the same psc, and from then on the table to do those little fuzzy join, and iteratively continue like this for all possible&amp;nbsp;values listed psc in the&amp;nbsp;CUSTOMERS&amp;nbsp;table&lt;/U&gt;&lt;/B&gt;?&amp;nbsp; 
&lt;BR /&gt;If you could, can you explain how to do this dynamically, respectively ja stock and set the component? 
&lt;BR /&gt;I use Talend Open Studio for Data Integration and PostgreSQL 9.3 
&lt;BR /&gt;Thanks for your help, I am at a loss.</description>
      <pubDate>Thu, 24 Jul 2014 09:44:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357001#M122402</guid>
      <dc:creator>peterko</dc:creator>
      <dc:date>2014-07-24T09:44:14Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy match &amp; dynamic join</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357002#M122403</link>
      <description>Hi pantolik, 
&lt;BR /&gt;I don't thin that all of your 1 M records needs a fuzzy match... In order to optimize the process what you can do&amp;nbsp; 
&lt;BR /&gt;- First you have simple inner join using tMap for all of your records in join... 
&lt;BR /&gt;Above step may give you some x records and 1M-x records are rejected due to non match... keep it in separate table of buffer... 
&lt;BR /&gt;- Next step to use this table and again join using your fuzzy logic. 
&lt;BR /&gt;In order to improve the joining performance, I would recommend to create proper indexes on database table 
&lt;BR /&gt;use file system in tMap for storing the lookup data than the memory... 
&lt;BR /&gt;Try this, certainly performance of your job would be enhanced... once done we will try some other techniques as well. 
&lt;BR /&gt;Vaibhav</description>
      <pubDate>Thu, 24 Jul 2014 10:26:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357002#M122403</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-07-24T10:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy match &amp; dynamic join</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357003#M122404</link>
      <description>Hi, I did it as follows, but not to restrict users, only around 8%. It would not somehow make iterative sent CUSTOMER into Fuzzy Match based on its postal_code and dynamically on CUSTOMER postal code made a inner join with USER postal code and use it like lookup? 
&lt;BR /&gt; 
&lt;A href="https://community.talend.com/legacyfs/online/membersTempo/207884/talend.png" target="_blank"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MCVD.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/143834i09D57887E69CB701/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MCVD.png" alt="0683p000009MCVD.png" /&gt;&lt;/span&gt; &lt;/A&gt;</description>
      <pubDate>Thu, 24 Jul 2014 11:42:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357003#M122404</guid>
      <dc:creator>peterko</dc:creator>
      <dc:date>2014-07-24T11:42:50Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy match &amp; dynamic join</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357004#M122405</link>
      <description>Are you getting desired results?... Think of options given in earlier post.&lt;BR /&gt;Thanks&lt;BR /&gt;Vaibhav</description>
      <pubDate>Thu, 24 Jul 2014 11:55:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357004#M122405</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-07-24T11:55:22Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy match &amp; dynamic join</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357005#M122406</link>
      <description>There was little acceleration, but not very significant.
&lt;BR /&gt;Maybe I did not fully understand how you previously thought.
&lt;BR /&gt;Can you please explain how to connect components for example, at least in text style:
&lt;BR /&gt;
&lt;B&gt;compA &lt;/B&gt;---actionA---&amp;gt; 
&lt;B&gt;compB&lt;/B&gt; ---actionB---&amp;gt; 
&lt;B&gt;compC&amp;nbsp;&lt;/B&gt;
&lt;BR /&gt;
&lt;B&gt;compD &lt;/B&gt;---actionC---^</description>
      <pubDate>Thu, 24 Jul 2014 12:04:25 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Fuzzy-match-dynamic-join/m-p/2357005#M122406</guid>
      <dc:creator>peterko</dc:creator>
      <dc:date>2014-07-24T12:04:25Z</dc:date>
    </item>
  </channel>
</rss>

