<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove files that as the same content in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279101#M54375</link>
    <description>did it solved your problem ?</description>
    <pubDate>Fri, 11 Jan 2019 14:24:20 GMT</pubDate>
    <dc:creator>akumar2301</dc:creator>
    <dc:date>2019-01-11T14:24:20Z</dc:date>
    <item>
      <title>Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279094#M54368</link>
      <description>&lt;P&gt;Hello !&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I'd like to remove all files that have the same content but keep one.&lt;/P&gt; 
&lt;P&gt;The final result should be files with all different content.&lt;/P&gt; 
&lt;P&gt;&lt;FONT size="2"&gt;My file name format is &lt;EM&gt;fileName_timestamp.csv&lt;/EM&gt;&lt;/FONT&gt;&lt;/P&gt; 
&lt;P&gt;&lt;FONT face="arial black,avant garde" size="5"&gt;&lt;STRONG&gt;For exemple :&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;My directory&amp;nbsp; looks like this :&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;-&amp;nbsp;fileName_t1m3st4mp.csv&lt;/P&gt; 
&lt;P&gt;- fileName_0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;P&gt;- fileName_4n0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;Content in my files looks like this :&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;fileName_t1m3st4mp.csv&lt;/P&gt; 
&lt;PRE&gt;This is a content&lt;/PRE&gt; 
&lt;P&gt;fileName_0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;PRE&gt;This is a content&lt;/PRE&gt; 
&lt;P&gt;fileName_4n0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;PRE&gt;This is a different content&lt;/PRE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;When i run the job :&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&lt;EM&gt;fileName_0th3rt1m3st4mp.csv should be deleted&lt;/EM&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;EM&gt;Now my directory should only have :&lt;/EM&gt;&lt;/P&gt; 
&lt;P&gt;&lt;EM&gt;-&amp;nbsp;&lt;/EM&gt;fileName_0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;P&gt;-&amp;nbsp;fileName_4n0th3rt1m3st4mp.csv&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;using Talend ESB 7&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;If you have any suggestion, please do !&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 06:53:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279094#M54368</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T06:53:40Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279095#M54369</link>
      <description>&lt;P&gt;Try with tFileList , tMemoriseRow tFileCompare and tFileDelete .&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Not sure if these are part of ESB&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jan 2019 16:16:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279095#M54369</guid>
      <dc:creator>akumar2301</dc:creator>
      <dc:date>2019-01-09T16:16:08Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279096#M54370</link>
      <description>&lt;P&gt;Thanks for your response !&amp;nbsp;&lt;/P&gt;&lt;P&gt;Those components are indeed in ESB.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to compare each files with all the others, i'm not sure how i can do that with a FileCompare component since it only allow 1 input.&lt;BR /&gt;&lt;BR /&gt;Can you guide me through your thinking ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 13:37:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279096#M54370</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-01-10T13:37:19Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279097#M54371</link>
      <description>You are right with tFileCompare you might have some issues.
&lt;BR /&gt;
&lt;BR /&gt;1) Actually you need to get the checksum of each file using
&lt;BR /&gt;
&lt;BR /&gt;2) Find files having same checksum and delete the duplicate file.
&lt;BR /&gt;
&lt;BR /&gt;tFileList --&amp;gt; tFileProperties(MD5 option) --&amp;gt; tFileOutput
&lt;BR /&gt;
&lt;BR /&gt;onSubJobOK
&lt;BR /&gt;
&lt;BR /&gt;tFileInput --&amp;gt; tUniqRow (getDuplicate filename based on checksum) --&amp;gt; tFlowtoInterate --&amp;gt; tFileDelete
&lt;BR /&gt;
&lt;BR /&gt;This should work.</description>
      <pubDate>Thu, 10 Jan 2019 15:29:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279097#M54371</guid>
      <dc:creator>akumar2301</dc:creator>
      <dc:date>2019-01-10T15:29:11Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279098#M54372</link>
      <description>&lt;P&gt;Here you're mainly checking the file name not the actual content.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I think i found something. I can log content and filename independently but can't find a way the get both of them at the same time.&lt;/P&gt; 
&lt;P&gt;My goal here is get a&amp;nbsp;output that contains all the file names and file content. (fileName;fileContent)&lt;/P&gt; 
&lt;P&gt;I guess i'll be able to use a tUniqRow to check duplicate content once i've figured out this.....&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JLHZkWa.png" style="width: 606px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M1tA.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/128006i1864518A40FD38B4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1tA.png" alt="0683p000009M1tA.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 15:45:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279098#M54372</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-01-10T15:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279099#M54373</link>
      <description>tFileProperties will get checksum based on Filecontent not filename.
&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Jan 2019 15:50:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279099#M54373</guid>
      <dc:creator>akumar2301</dc:creator>
      <dc:date>2019-01-10T15:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279100#M54374</link>
      <description>&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="removeduplicate.JPG" style="width: 791px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M1tF.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/154838iC3473D8EE6653588/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M1tF.jpg" alt="0683p000009M1tF.jpg" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;it worked . Removed duplicate files. Try once.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 16:09:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279100#M54374</guid>
      <dc:creator>akumar2301</dc:creator>
      <dc:date>2019-01-10T16:09:50Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279101#M54375</link>
      <description>did it solved your problem ?</description>
      <pubDate>Fri, 11 Jan 2019 14:24:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279101#M54375</guid>
      <dc:creator>akumar2301</dc:creator>
      <dc:date>2019-01-11T14:24:20Z</dc:date>
    </item>
    <item>
      <title>Re: Remove files that as the same content</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279102#M54376</link>
      <description>What's the component you renamed "selectMD5Option" ?&lt;BR /&gt;I'll try that</description>
      <pubDate>Tue, 15 Jan 2019 09:01:18 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Remove-files-that-as-the-same-content/m-p/2279102#M54376</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-01-15T09:01:18Z</dc:date>
    </item>
  </channel>
</rss>

