<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to remove duplicates from two excel tables? in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281677#M3659</link>
    <description>Hi TRF, thanks for the reply. I checked out the tFuzzyMatch component and I was able to remove some duplicates using Levenshtein. However, my use case is slightly different. If I have two excel tables with employee details and the phone number is provided as (234)-123-4567 in one table and 2341234567 in another tables, I need a component which can compare both tables and decide both of them are same employee based on a regex or some other kind of logic. Is there anything like that available in Talend? Thanks</description>
    <pubDate>Tue, 03 Oct 2017 14:45:34 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2017-10-03T14:45:34Z</dc:date>
    <item>
      <title>How to remove duplicates from two excel tables?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281675#M3657</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I am fairly new to Talend. My use case is,I have two excel tables with employee data. The columns are name, email, street and phone number. I need to find out the common employees between both the tables based on phone number or street and put the data into a third excel sheet. I can do the above using a tuniqRow and Tunite. However, the phone number could be&amp;nbsp; &amp;nbsp; of the format , +1 8x9-201-1xx5 in one table and in the second table, it could be 8x9-201-1xx5. the street field could be Main street on one table and Main st in another. How can I deal with that? Should I use a tmap, tregex? and how should I filter out the data? Thank you very much!&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2017 20:51:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281675#M3657</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-09-28T20:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates from two excel tables?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281676#M3658</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You should have some search around tFuzzyMatch component which is here to help for deduplication using Levenshtein, Metaphone or Double Metaphone algorythm.&lt;/P&gt;&lt;P&gt;Probably it could help you to solve this kind of use case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let us know.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Sep 2017 07:57:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281676#M3658</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2017-09-29T07:57:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates from two excel tables?</title>
      <link>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281677#M3659</link>
      <description>Hi TRF, thanks for the reply. I checked out the tFuzzyMatch component and I was able to remove some duplicates using Levenshtein. However, my use case is slightly different. If I have two excel tables with employee details and the phone number is provided as (234)-123-4567 in one table and 2341234567 in another tables, I need a component which can compare both tables and decide both of them are same employee based on a regex or some other kind of logic. Is there anything like that available in Talend? Thanks</description>
      <pubDate>Tue, 03 Oct 2017 14:45:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/How-to-remove-duplicates-from-two-excel-tables/m-p/2281677#M3659</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-10-03T14:45:34Z</dc:date>
    </item>
  </channel>
</rss>

