<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Soft delete when there are duplicates in source in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253756#M36997</link>
    <description>&lt;P&gt;To do a ranking you need a tSortRow and a tMap.&lt;/P&gt; 
&lt;P&gt;You may omit the tSortRow component if the duplicates rows are exactly the same, if there is someting different you can order by these fields. Then in the tMap create a variable with integer type. In the expression write: Numeric.sequence(fields you want to group by,1,1). These fields must appear as only one string field so you can do something like:&amp;nbsp;Numeric.sequence(field1.toString() + field2.toString(),1,1)&lt;/P&gt;</description>
    <pubDate>Wed, 15 Jan 2020 13:06:40 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2020-01-15T13:06:40Z</dc:date>
    <item>
      <title>Soft delete when there are duplicates in source</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253755#M36996</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am reading data from a csv file which &lt;STRONG&gt;does not have a primary key&lt;/STRONG&gt; and there are &lt;STRONG&gt;duplicate entries&lt;/STRONG&gt; as well. This data is loaded into database. The requirement is to implement the soft delete when something gets deleted from source.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For example if there are 4 occurrences of a particular record in source and once of them is deleted, one of the record to be marked as N in target during the next load and rest three would be Y. In case two of them gets deleted, two records should be marked as N and rest two will remain as Y.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am thinking about sequencing the group of duplicates in incremental order and using that column to differentiate between the duplicates. However I'm not sure how to practically implement this.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can someone please help with ideas.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks in advance,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 03:34:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253755#M36996</guid>
      <dc:creator>Aami</dc:creator>
      <dc:date>2024-11-16T03:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: Soft delete when there are duplicates in source</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253756#M36997</link>
      <description>&lt;P&gt;To do a ranking you need a tSortRow and a tMap.&lt;/P&gt; 
&lt;P&gt;You may omit the tSortRow component if the duplicates rows are exactly the same, if there is someting different you can order by these fields. Then in the tMap create a variable with integer type. In the expression write: Numeric.sequence(fields you want to group by,1,1). These fields must appear as only one string field so you can do something like:&amp;nbsp;Numeric.sequence(field1.toString() + field2.toString(),1,1)&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2020 13:06:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253756#M36997</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-01-15T13:06:40Z</dc:date>
    </item>
    <item>
      <title>Re: Soft delete when there are duplicates in source</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253757#M36998</link>
      <description>&lt;P&gt;Any idea how to achieve this?&lt;/P&gt;&lt;P&gt;Since the source doesn't have a primary key and has duplicated rows, Marking any of the deleted record as inactive in target is looking like a challenge.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2020 09:48:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253757#M36998</guid>
      <dc:creator>Aami</dc:creator>
      <dc:date>2020-01-24T09:48:58Z</dc:date>
    </item>
    <item>
      <title>Re: Soft delete when there are duplicates in source</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253758#M36999</link>
      <description>&lt;OL&gt;&lt;LI&gt;Select all active records from Target and source and store them in buffer (tHashoutput) component. &lt;/LI&gt;&lt;LI&gt;While fetching records from Source create a Hash value (&lt;B&gt;Hash1&lt;/B&gt;) of all the applicable columns using DataMasking.createMD5 or any similar hashing technique in tMap1. In tMap2 create a numeric sequence for Hash value using Numeric.sequence(Hash1,1,1). Use this Sequence as ID for the Target Table and create another Hash key (&lt;B&gt;Hash2&lt;/B&gt;) using all the applicable columns +Hash1 + Numeric sequence (ID)&lt;/LI&gt;&lt;LI&gt;&lt;B&gt;Insert New Record:  &lt;/B&gt;Use src data as input1 and target data as input 2 into tMap and join them on &lt;B&gt;Hash2 and ID &lt;/B&gt; created in previous step and all the applicable columns. Insert if &lt;B&gt;ID == null .&lt;/B&gt; Add below columns with default values as mentioned&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;ROW_EFFECTIVE_DATE -&amp;gt;  TalendDate.getCurrentDate()&lt;/P&gt;&lt;P&gt;ROW_EXPIRY_DATE -&amp;gt;  TalendDate.parseDate("yyyy-MM-dd","9999-12-31")&amp;nbsp;&lt;/P&gt;&lt;P&gt;ROW_CREATED_DATE-&amp;gt;  TalendDate.getCurrentDate()&lt;/P&gt;&lt;P&gt;ROW_UPDATED_DATE -&amp;gt; TalendDate.getCurrentDate()&lt;/P&gt;&lt;P&gt;ACTIVE_IND-&amp;gt;  "Y"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;4.    Logical delete of records deleted in source: &lt;/B&gt; Use target as input 1 and source as input 2 into tmap and join them on &lt;B&gt;Hash2 and ID &lt;/B&gt; created in step 2 and all the applicable columns. If &lt;B&gt;ID == null&lt;/B&gt; update the active indicator to N.&lt;/P&gt;&lt;P&gt;ROW_EFFECTIVE_DATE -&amp;gt;  Keep as it is in target&lt;/P&gt;&lt;P&gt;ROW_EXPIRY_DATE -&amp;gt;  TalendDate.addDate(TalendDate.getCurrentDate(),-1,"dd")&lt;/P&gt;&lt;P&gt;ROW_CREATED_DATE-&amp;gt;  Keep as it is in target&lt;/P&gt;&lt;P&gt;ROW_UPDATED_DATE -&amp;gt; TalendDate.getCurrentDate()&lt;/P&gt;&lt;P&gt;ACTIVE_IND-&amp;gt;  "N"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Nov 2020 07:42:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Soft-delete-when-there-are-duplicates-in-source/m-p/2253758#M36999</guid>
      <dc:creator>Aami</dc:creator>
      <dc:date>2020-11-02T07:42:34Z</dc:date>
    </item>
  </channel>
</rss>

