<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Slowly changing dimension - PostgresQL - scalability issue in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226463#M18366</link>
    <description>&lt;P&gt;It certainly isn't! &lt;IMG id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://community.qlik.com/" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt; 
&lt;P&gt;Where do I enable the logging statements? I don't see those options under the basic and debug run.&lt;/P&gt;</description>
    <pubDate>Wed, 24 Jun 2020 14:28:19 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2020-06-24T14:28:19Z</dc:date>
    <item>
      <title>Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226455#M18358</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;I used the SCD component for Postgres to implement a slowly changing dimension and though it works, it only seems to run efficiently without any Java memory heap errors on small data sets. My dimension is roughly 30M records and when I try to run the job on a 10M size dimension with 50 incoming type 2 changes, the job will take hours to run only to eventually give a Java memory heap error.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;So, my question is, is this SCD component only meant for small tables and just a quick convenient way to implement it for this specific purpose or is this supposed to work for big data sets as well? If not, is the way to go a custom made solution consisting of lookup and conditional splits for type 1 and 2 fields with more efficient bulk updates instead of row by row which the SCD component perhaps does?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Roland&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jun 2020 18:44:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226455#M18358</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-19T18:44:21Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226456#M18359</link>
      <description>&lt;P&gt;in my personal opinion, yes - SCD components could be used only for relatively small tables with a small number of changes.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;But SCDELT, do all work on the target database with just 4 generated queries. The only one issue for SCDELT component - it require both tables (staging and target) to be in the same schema, this is not always useful, but could be managed by creating and delete staging table.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2020 01:14:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226456#M18359</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-21T01:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226457#M18360</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;Thanks for the reply. I'll look into that. The shortcoming of having both the staging and target in one schema is a strange one. I'll have to accommodate for that as I'm using a persistent staging area in a different schema currently.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Roland&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jun 2020 14:11:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226457#M18360</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-22T14:11:06Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226458#M18361</link>
      <description>&lt;P&gt;possibly it is a bug, when I try to use table name with schema job return error&lt;/P&gt;&lt;P&gt;you can test&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jun 2020 14:17:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226458#M18361</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-22T14:17:59Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226459#M18362</link>
      <description>&lt;P&gt;Okay, I'll try something out.&lt;/P&gt;&lt;P&gt;Roland&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jun 2020 14:26:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226459#M18362</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-22T14:26:28Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226460#M18363</link>
      <description>&lt;P&gt;I guess another condition I noticed is that all column names of the source table needs to be the same as the target dimension. I actually had a tmap object initially between the source and target that renamed the column names and changed some of the data types, but I suppose I'll have to first create a separate intermediate staging table for that.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jun 2020 17:50:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226460#M18363</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-23T17:50:47Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226461#M18364</link>
      <description>&lt;P&gt;Hi, I've looked into the ELT solution, but I don't understand how it works.&lt;/P&gt;&lt;P&gt;I created a separate simplified job to get a better understanding of the mechanics.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It consist of an incoming stg_acnt table (see screenshot) with 1 record and another dim_acnt table with 2 records (screenshot).&lt;BR /&gt;Below the create and insert statements as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE TABLE test.stg_acnt&lt;BR /&gt;(acnt_id VARCHAR (50)&lt;BR /&gt;, prod_code VARCHAR (10)&lt;BR /&gt;, open_dt integer&lt;BR /&gt;, valid_from_dt TIMESTAMP&lt;BR /&gt;, valid_to_dt TIMESTAMP&lt;BR /&gt;);&lt;/P&gt;&lt;P&gt;INSERT INTO test.stg_acnt VALUES ('A511', 'BBB', 20200623, CURRENT_DATE, '2199-12-31 23:59:00');&lt;BR /&gt;COMMIT;&lt;/P&gt;&lt;P&gt;CREATE TABLE test.dim_acnt_scd&lt;BR /&gt;( dim_acnt_id SERIAL&lt;BR /&gt;, acnt_id VARCHAR (50)&lt;BR /&gt;, prod_code VARCHAR (10)&lt;BR /&gt;, open_dt integer&lt;BR /&gt;, valid_from_dt TIMESTAMP&lt;BR /&gt;, valid_to_dt TIMESTAMP&lt;BR /&gt;);&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;INSERT INTO test.dim_acnt_scd VALUES (1, 'A511', 'AAA', 20200602, '2020-06-02 00:00:00', '2199-12-31 23:59:00');&lt;BR /&gt;INSERT INTO test.dim_acnt_scd VALUES (2, 'B777', 'AAA', 20170515, '2017-05-15 00:00:00', '2199-12-31 23:59:00');&lt;BR /&gt;COMMIT;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I created the job as follows (see screenshots). Postgres DB input object to read in the staging table connected to the ELT object.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;source table: "stg_acnt"&lt;BR /&gt;table: "dim_acnt_scd"&lt;BR /&gt;surrogate key: dim_acnt_id&lt;BR /&gt;creation: auto increment&lt;BR /&gt;Source key: acnt_id&lt;BR /&gt;SCD type 2 field: prod_code&lt;BR /&gt;SCD type 1: open_dt&lt;BR /&gt;start date: valid_from_dt&lt;BR /&gt;end date: valid_to_dt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Based on account id 'A511' and a change in product code which I classified as SCD Type 2,I expected one record to be added to the dim_acnt_scd table, which did not happen.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Roland&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Mb5y"&gt;tDBInput.png&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Mb63"&gt;tDBSCDELT_1 part1.png&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Mb6D"&gt;tDBSCDELT_1 part2.png&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Mb6I"&gt;test_dim_acnt_scd.png&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Mb6N"&gt;test_stg_acnt.png&lt;/A&gt;</description>
      <pubDate>Wed, 24 Jun 2020 13:48:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226461#M18364</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-24T13:48:21Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226462#M18365</link>
      <description>&lt;P&gt;The world is not ideal &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For better understanding how it works with the Talend way, you can enable statements logging on PostgreSQL and check actual queries which Talend generate&amp;nbsp;&lt;/P&gt;&lt;P&gt;for you, you will see:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;columns valid_from and valid_to - located and managed in the target table&lt;/LI&gt;&lt;LI&gt;Talend will use as valid from &lt;STRONG&gt;current timestamp&lt;/STRONG&gt;, and as valid_to - &lt;STRONG&gt;null&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;really there inconsistency and incompatibility of approaches and technologies between ETL and ELT components, this is why often people prefer to write their own sql code, load all into staging table than just run 3-4 queries&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;regards, Vlad&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jun 2020 14:24:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226462#M18365</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-24T14:24:26Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226463#M18366</link>
      <description>&lt;P&gt;It certainly isn't! &lt;IMG id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://community.qlik.com/" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt; 
&lt;P&gt;Where do I enable the logging statements? I don't see those options under the basic and debug run.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jun 2020 14:28:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226463#M18366</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-24T14:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226464#M18367</link>
      <description>&lt;P&gt;this is PostgreSQL option, you need to enable it on PostgreSQL server, section logging of postgresql.conf:&lt;/P&gt;
&lt;PRE&gt;logging_collector = on
log_statement = 'all'                   # none, ddl, mod, all&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Jun 2020 14:31:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226464#M18367</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-24T14:31:47Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226465#M18368</link>
      <description>&lt;P&gt;Thanks. I had initially look at the code where the update and insert statements strings were created, but this is definitely better. Apparently for this simple example, it created the following insert statements for the type 2 change (see below).&lt;/P&gt; 
&lt;P&gt;The first query I can relate to more as the product code is being compared between the two tables, but then there is the condition&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;d."valid_to_dt" = '2020-06-24 09:45:06.601' which is applied to the existing record in the dimension table, which is odd since for an existing record I've put the valid_to_dt by default to 2199 or so and therefore this query does not result in anything. The second query is also weird since there is a where clause of the source key being null.&lt;/P&gt; 
&lt;P&gt;&lt;BR /&gt;INSERT INTO "test"."dim_acnt_scd"("acnt_id", "open_dt", "prod_code", "valid_from_dt", "valid_to_dt")&lt;BR /&gt;SELECT t."acnt_id", t."open_dt", t."prod_code", '2020-06-24 09:45:06.601', NULL&lt;BR /&gt;FROM "test"."stg_acnt" t, "test"."dim_acnt_scd" d&lt;BR /&gt;WHERE d."acnt_id" = t."acnt_id" AND&lt;BR /&gt;( COALESCE(d."prod_code"&amp;lt;&amp;gt; t."prod_code", (d."prod_code" is null and not t."prod_code" is null) or (not d."prod_code" is null and t."prod_code" is null)))&lt;BR /&gt;AND d."valid_to_dt" = '2020-06-24 09:45:06.601'&lt;/P&gt; 
&lt;P&gt;&lt;BR /&gt;INSERT INTO "test"."dim_acnt_scd"("acnt_id", "open_dt", "prod_code", "valid_from_dt", "valid_to_dt")&lt;BR /&gt;SELECT t."acnt_id", t."open_dt", t."prod_code", '2020-06-24 09:45:06.601', NULL&lt;BR /&gt;FROM "test"."stg_acnt" t&lt;BR /&gt;LEFT JOIN "test"."dim_acnt_scd" d&lt;BR /&gt;ON d."acnt_id" = t."acnt_id"&lt;BR /&gt;WHERE ( d."acnt_id" IS NULL)&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jun 2020 15:04:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226465#M18368</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-24T15:04:55Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226466#M18369</link>
      <description>Yes, it mean - you need to have empty target table for proper ELT work&lt;BR /&gt;</description>
      <pubDate>Wed, 24 Jun 2020 15:10:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226466#M18369</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2020-06-24T15:10:37Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226467#M18370</link>
      <description>&lt;P&gt;Okay, I now that the insert statement is preceded by an update statement that sets the valid_to_date.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;UPDATE "test"."dim_acnt_scd" d&lt;BR /&gt;SET "valid_to_dt" = '2020-06-24 09:45:06.601'&lt;BR /&gt;FROM "test"."stg_acnt" t&lt;BR /&gt;WHERE d."acnt_id" = t."acnt_id"&lt;BR /&gt;AND ( COALESCE(d."prod_code"&amp;lt;&amp;gt; t."prod_code", (d."prod_code" is null and not t."prod_code" is null) or (not d."prod_code" is null and t."prod_code" is NULL)))&lt;BR /&gt;AND d."valid_to_dt" IS NULL&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It also requires the existing valid_to_dt to be null instead of 2199.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jun 2020 15:10:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226467#M18370</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-24T15:10:38Z</dc:date>
    </item>
    <item>
      <title>Re: Slowly changing dimension - PostgresQL - scalability issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226468#M18371</link>
      <description>&lt;P&gt;The simple example is working now after I updated the valid_to_dt field to null. Great...now the other one&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jun 2020 15:13:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Slowly-changing-dimension-PostgresQL-scalability-issue/m-p/2226468#M18371</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-24T15:13:34Z</dc:date>
    </item>
  </channel>
</rss>

