<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic [resolved] Difference in data rows in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251731#M35589</link>
    <description>How can I find the difference between data in two data files?
&lt;BR /&gt;I have two files that contain rows of customer data, with the first column being the primary key.
&lt;BR /&gt;Suppose the first file (reference.csv) has the following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,36000
&lt;BR /&gt;2,Jane,Doe,09/23/2000,12000
&lt;BR /&gt;3,Richard,Johnson,12/02/1990,29000
&lt;BR /&gt;And the second file (compare.csv )has these following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,77777
&lt;BR /&gt;2,Jane,Doe,09/23/2000,12000
&lt;BR /&gt;3,Richard,Johnson,12/02/1990,29000
&lt;BR /&gt;4,Mary,Jones,03/12/1956,52000
&lt;BR /&gt;I would like to have an output file (difference.csv) with the following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,77777
&lt;BR /&gt;4,Mary,Jones,03/12/1956,52000
&lt;BR /&gt;So, the job would look for any rows in compare.csv that differ from or do not exist in reference.csv, then output only those rows to difference.csv.
&lt;BR /&gt;In the previous example, the output showed the first record from compare.csv because the fifth column changed from "36000" in reference.csv to "77777" in compare.csv.
&lt;BR /&gt;It also showed the fourth record from compare.csv because it did not exist in reference.csv.</description>
    <pubDate>Sat, 16 Nov 2024 13:17:04 GMT</pubDate>
    <dc:creator>_AnonymousUser</dc:creator>
    <dc:date>2024-11-16T13:17:04Z</dc:date>
    <item>
      <title>[resolved] Difference in data rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251731#M35589</link>
      <description>How can I find the difference between data in two data files?
&lt;BR /&gt;I have two files that contain rows of customer data, with the first column being the primary key.
&lt;BR /&gt;Suppose the first file (reference.csv) has the following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,36000
&lt;BR /&gt;2,Jane,Doe,09/23/2000,12000
&lt;BR /&gt;3,Richard,Johnson,12/02/1990,29000
&lt;BR /&gt;And the second file (compare.csv )has these following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,77777
&lt;BR /&gt;2,Jane,Doe,09/23/2000,12000
&lt;BR /&gt;3,Richard,Johnson,12/02/1990,29000
&lt;BR /&gt;4,Mary,Jones,03/12/1956,52000
&lt;BR /&gt;I would like to have an output file (difference.csv) with the following records:
&lt;BR /&gt;1,John,Smith,04/02/1970,77777
&lt;BR /&gt;4,Mary,Jones,03/12/1956,52000
&lt;BR /&gt;So, the job would look for any rows in compare.csv that differ from or do not exist in reference.csv, then output only those rows to difference.csv.
&lt;BR /&gt;In the previous example, the output showed the first record from compare.csv because the fifth column changed from "36000" in reference.csv to "77777" in compare.csv.
&lt;BR /&gt;It also showed the fourth record from compare.csv because it did not exist in reference.csv.</description>
      <pubDate>Sat, 16 Nov 2024 13:17:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251731#M35589</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2024-11-16T13:17:04Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Difference in data rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251732#M35590</link>
      <description>Never mind, I solved the problem. It wasn't working before because there was a problem with the format of the new data file, but now it works. Solution: 
&lt;BR /&gt;I used the two input files as inputs to a tmap, with the newer data file (compare.csv) being the main row and the older one (reference.csv) being the lookup input. 
&lt;BR /&gt;I used every field from the main input (compareIn) as a foreign key in the lookup input (referenceIn) so that it compares entire rows, 
&lt;BR /&gt;then used every field from compareIn as an output (differenceOut). 
&lt;BR /&gt;I activated the "Inner join" on referenceIn and also activated "Inner Join reject" on differenceOut, 
&lt;BR /&gt;so that it would reject all rows from compareIn that are identical to any rows in referenceIn, 
&lt;BR /&gt;and include only all other rows. 
&lt;BR /&gt;I'm not sure if this is the most logically sound way to perform the operation. I'm very new to Talend Open Studio. If there's a more elegant solution, please post it or where to find it. 
&lt;BR /&gt;It works perfectly, however.</description>
      <pubDate>Fri, 17 Sep 2010 20:14:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251732#M35590</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2010-09-17T20:14:42Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Difference in data rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251733#M35591</link>
      <description>Hi Mark,&lt;BR /&gt;I tried to do the same thing what u did. But i am not able to see constraints or joins in tMap. Please help me.&lt;BR /&gt;My requirement is to compare a DB table with a flat file.&lt;BR /&gt;I converted the DB table to flat file and compare with the other positioned file.&lt;BR /&gt;Regards,&lt;BR /&gt;Manoj.V</description>
      <pubDate>Wed, 13 Oct 2010 12:41:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251733#M35591</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2010-10-13T12:41:56Z</dc:date>
    </item>
    <item>
      <title>Re: [resolved] Difference in data rows</title>
      <link>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251734#M35592</link>
      <description>My 2 cents, with help of answer that I saw in another thread:&lt;BR /&gt;all fields that are used as foreign keys, must be initialized. If you have null values, it will be "join rejected".&lt;BR /&gt;NULL != NULL</description>
      <pubDate>Tue, 23 Oct 2012 09:51:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/resolved-Difference-in-data-rows/m-p/2251734#M35592</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-10-23T09:51:49Z</dc:date>
    </item>
  </channel>
</rss>

