<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Implement SCD Type 2 in Talend in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236156#M24958</link>
    <description>&lt;P&gt;I need to create a process that imports data from a Relational database on to Hive/HDFS incrementally. The trick is that, &lt;STRONG&gt;on Hive we need to maintain history of transactions for each primary key&lt;/STRONG&gt;. This is what is called, '&lt;STRONG&gt;Type 2 SCD&lt;/STRONG&gt;'. In other words, if primary key (PK) is new, we will simply insert a row on Hive but if it's old then we need to make the old row 'In Active' &amp;amp; update its 'End_Timestamp' AND then insert a new row with 'Active Flag' set to 'true'.&lt;BR /&gt;&lt;BR /&gt;Currently, we are doing this by using a MERGE query in Hive which requires Compaction &amp;amp; Analyze; otherwise we run into OutOfMemory errors. We are thinking of replacing that with a Spark program BUT we're wondering if this can be done in Talend without writing code - OR - writing minimal code.&lt;BR /&gt;&lt;BR /&gt;Please let me know. Thanks.&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 03:41:32 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T03:41:32Z</dc:date>
    <item>
      <title>Implement SCD Type 2 in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236156#M24958</link>
      <description>&lt;P&gt;I need to create a process that imports data from a Relational database on to Hive/HDFS incrementally. The trick is that, &lt;STRONG&gt;on Hive we need to maintain history of transactions for each primary key&lt;/STRONG&gt;. This is what is called, '&lt;STRONG&gt;Type 2 SCD&lt;/STRONG&gt;'. In other words, if primary key (PK) is new, we will simply insert a row on Hive but if it's old then we need to make the old row 'In Active' &amp;amp; update its 'End_Timestamp' AND then insert a new row with 'Active Flag' set to 'true'.&lt;BR /&gt;&lt;BR /&gt;Currently, we are doing this by using a MERGE query in Hive which requires Compaction &amp;amp; Analyze; otherwise we run into OutOfMemory errors. We are thinking of replacing that with a Spark program BUT we're wondering if this can be done in Talend without writing code - OR - writing minimal code.&lt;BR /&gt;&lt;BR /&gt;Please let me know. Thanks.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 03:41:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236156#M24958</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T03:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: Implement SCD Type 2 in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236157#M24959</link>
      <description>&lt;P&gt;If the destination environment is a Relational DB we have components tDBSCD which will suffice the need. Your destination environment being HDFS you can still do this using the designated components. I was able to fetch a post that is pretty close to your requirement...please read through this post&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCliTCAS" target="_blank"&gt;https://community.talend.com/t5/Design-and-Development/SCD-implementation-in-hive-hbase-using-Talend/td-p/15174&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2020 15:34:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236157#M24959</guid>
      <dc:creator>tnewbie</dc:creator>
      <dc:date>2020-01-03T15:34:03Z</dc:date>
    </item>
    <item>
      <title>Re: Implement SCD Type 2 in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236158#M24960</link>
      <description>&lt;P&gt;Thanks for the reply. There are few issues with your post:&lt;BR /&gt;&lt;BR /&gt;1) The image is too small. I can't see its contents. Tried downloading etc.&lt;BR /&gt;2) The zip file is no longer available.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt; 
&lt;P&gt;By the way, can you please share some performance numbers for your solution? Say for 10 Million rows, 100 Million rows, 1 Billion rows etc. Also, your answer is over 5 years old. Is this still the best way to do it?&lt;BR /&gt;&lt;BR /&gt;In any case, thanks for the reply. Will look into your solution.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2020 16:38:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236158#M24960</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-01-03T16:38:59Z</dc:date>
    </item>
    <item>
      <title>Re: Implement SCD Type 2 in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236159#M24961</link>
      <description>&lt;P&gt;One more concern... your solution requires creation of a 'Delta' table, right? That's extra writing to disk. If we do it in Spark it will use memory for this...well.. to an extend. I would really like to know how well your solution performs. Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2020 16:50:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236159#M24961</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-01-03T16:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: Implement SCD Type 2 in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236160#M24962</link>
      <description>&lt;P&gt;I guess my response was not clear enough to say that I have not worked on SCD/CDC to HDFC, i have only used SCD when the target environments were RDBMS. That was neither my solution nor my recommendation, I got the URL on googling and shared with you with a hope that it might help you.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2020 17:49:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Implement-SCD-Type-2-in-Talend/m-p/2236160#M24962</guid>
      <dc:creator>tnewbie</dc:creator>
      <dc:date>2020-01-03T17:49:13Z</dc:date>
    </item>
  </channel>
</rss>

