<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove all duplicate rows in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295439#M68216</link>
    <description>&lt;P&gt;Did you try?&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SELECT *&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FROM TABLE&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;MINUS&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SELECT distinct *&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FROM TABLE;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 14 Aug 2017 06:59:10 GMT</pubDate>
    <dc:creator>wangbinlxx</dc:creator>
    <dc:date>2017-08-14T06:59:10Z</dc:date>
    <item>
      <title>Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295432#M68209</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt; 
&lt;P&gt;I'm new to Talend and I have a question,&lt;/P&gt; 
&lt;P&gt;Have any component let me remove all duplicate rows?&lt;/P&gt; 
&lt;P&gt;Sample data:&lt;/P&gt; 
&lt;P&gt;Name, Age&lt;/P&gt; 
&lt;P&gt;HieuDoan, 15&lt;/P&gt; 
&lt;P&gt;LinhNa, 16&lt;/P&gt; 
&lt;P&gt;HieuDoan, 20&lt;/P&gt; 
&lt;P&gt;NamL, 17&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I have tried tUniqRow but it still keep the first duplicate row and not remove all duplicate:&lt;/P&gt; 
&lt;P&gt;The ouput I got with tUniqRow :&lt;/P&gt; 
&lt;P&gt;HieuDoan, 15&lt;/P&gt; 
&lt;P&gt;LinhNa, 16&lt;/P&gt; 
&lt;P&gt;NamL, 17&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;The output i need:&lt;/P&gt; 
&lt;P&gt;LinhNa, 16&lt;/P&gt; 
&lt;P&gt;NamL, 17&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Can you give me a suggest? Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 11 Aug 2017 03:11:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295432#M68209</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-11T03:11:59Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295433#M68210</link>
      <description>&lt;P&gt;Hi&lt;BR /&gt;Not a component can achieve it directly, you can cache the result into memory and do an inner join in the next subjob to get the expected result. eg:&lt;BR /&gt;tfileinputdelimimited--main--tuniqrow--unique--thashoutput1&lt;BR /&gt; --duplicated--thashoutput2&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;BR /&gt;onsubjobok&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&lt;BR /&gt;tHashinput1--main--tmap--tlogrow&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;lookup&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; |&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; thashinput2&lt;BR /&gt;thashinput1: read data from thashoutput1&lt;BR /&gt;thashinput2: read data from thashoutput2&lt;BR /&gt;&lt;BR /&gt;on tMap: do an inner join based on the keys you defined on tuniqrow, set the 'Catch lookup inner join reject' option as true in the output table. &lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;Shong&lt;/P&gt;</description>
      <pubDate>Fri, 11 Aug 2017 04:52:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295433#M68210</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-11T04:52:21Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295434#M68211</link>
      <description>Hi Shong,
&lt;BR /&gt;My data source is Redshift and it have a lots data, where can I cache result into to use later ?</description>
      <pubDate>Fri, 11 Aug 2017 07:39:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295434#M68211</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-11T07:39:10Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295435#M68212</link>
      <description>Store the data on disk instead of memory if there are lots of data. 
&lt;BR /&gt;
&lt;BR /&gt;</description>
      <pubDate>Fri, 11 Aug 2017 08:01:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295435#M68212</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-11T08:01:51Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295436#M68213</link>
      <description>&lt;P&gt;Talend do absolutely correct - tUniqRow DO NOT REMOVE all &lt;STRONG&gt;rows with duplicated&lt;/STRONG&gt;, it is remove &lt;STRONG&gt;duplicates&lt;/STRONG&gt; and leave&lt;STRONG&gt; original unique&lt;/STRONG&gt; value&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Your example - illustrate it great&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;if You want remove all rows which value in table more than 1 time&lt;/P&gt; 
&lt;P&gt;it could be easy done by SQL query, if You prefer Talend, You can do&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;tRedshiftInput -&amp;gt; tAggregate (count by value) -&amp;gt; tFilterRwo (condition - count == 1)&lt;BR /&gt;&lt;BR /&gt;result will be as You wish :&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;LinhNa, 16
NamL, 17&lt;/PRE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;as&amp;nbsp;&lt;/SPAN&gt;addition:&lt;/P&gt; 
&lt;PRE&gt;HieuDoan, 15 and HieuDoan, 20&lt;/PRE&gt; 
&lt;P&gt;generally &amp;nbsp;also 2 people with same name, but different age :-), but this is other topic&lt;/P&gt;</description>
      <pubDate>Sat, 12 Aug 2017 09:45:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295436#M68213</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-08-12T09:45:43Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295437#M68214</link>
      <description>&lt;P&gt;If your&amp;nbsp;&lt;SPAN&gt;source is Redshift, why not do this in DB, with a simple query?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;SELECT *&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;FROM TABLE&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;MINUS&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;SELECT distinct *&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;FROM TABLE;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Depending on the size of the table, you need to tune the query. However, most likely you can get the best performance doing it within DB.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Aug 2017 14:24:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295437#M68214</guid>
      <dc:creator>wangbinlxx</dc:creator>
      <dc:date>2017-08-13T14:24:26Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295438#M68215</link>
      <description>&lt;P&gt;Thank Vapukov,&lt;BR /&gt;How to do it by SQL query?&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2017 05:30:09 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295438#M68215</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-08-14T05:30:09Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295439#M68216</link>
      <description>&lt;P&gt;Did you try?&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SELECT *&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FROM TABLE&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;MINUS&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SELECT distinct *&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FROM TABLE;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2017 06:59:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295439#M68216</guid>
      <dc:creator>wangbinlxx</dc:creator>
      <dc:date>2017-08-14T06:59:10Z</dc:date>
    </item>
    <item>
      <title>Re: Remove all duplicate rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295440#M68217</link>
      <description>&lt;P&gt;in general case (if we accept You are try to exclude all names which only 1 in database, and not include any of records with more than 1 value)&lt;/P&gt; 
&lt;P&gt;it work as:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;SELECT 
     t1.*
FROM table_name t1 INNER JOIN 
     (
     SELECT
            -- next could be changed for MAX depending from logic
            MIN(id) 
     FROM table_name
     GROUP BY "name"
     HAVING count(*) = 1 
     ) t2 ON t1.id = t2.id&lt;/PRE&gt; 
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;code above mean - table have primary key and it name &amp;nbsp;"id", if table do not have primary key, logic could be different - for example&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;SELECT 
     t1.*
FROM table_name t1 INNER JOIN 
     (
     SELECT
            name 
     FROM table_name
     GROUP BY "name"
     HAVING count(*) = 1 
     ) t2 ON t1.name = t2.name&lt;/PRE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;it could be adjusted for real structure, because now I type "theoretical code" &lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MACn.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154443iC5B8CACEF3D12C6A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MACn.png" alt="0683p000009MACn.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2017 07:10:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-all-duplicate-rows/m-p/2295440#M68217</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-08-14T07:10:40Z</dc:date>
    </item>
  </channel>
</rss>

