<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic tPostgresQLInput performance scaling issue in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/tPostgresQLInput-performance-scaling-issue/m-p/2316066#M86648</link>
    <description>&lt;P&gt;I’m still new to TOS and figuring out stuff as I go along. Below is the next issue I have.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I initially had a job that involved a PostgreSQLInput component that extracted 5 columns from one table, a tmap in which I executed some data type conversions, created some date fields and loaded the data back into PostgreSQL. That went reasonably fast for a source table with less than a 100000 records or so. Then I received a source table with 32 million records and the performance went down to a grind halt and 70 rows/sec or so after I even increased the JVM arguments to XMs2560M and XMx7000M on my laptop.&lt;/P&gt; 
&lt;P&gt;I figured it must have been the tmap component in which I performed perhaps too many conversions and so I split it out and used a dedicated tconvertype. The performance didn’t increase either.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Then I decided to create a test job (attached screenshot) and reduced the complexity to only the PostgreSQLInput component with a simple select of just 5 columns and a tfileoutput delimited component into a flatfile. For the 32 million dataset, it would takes ages for the job to even start processing and eventually give a java memory heap error.&lt;/P&gt; 
&lt;P&gt;I then recreated a similar job in SSIS and SSIS was able to load all 32 million records in 3 minutes and 42 secs with an avg. speed of 144,144 rows/sec.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I also searched the forums here and I came across another user with similar performance issues for PostgresQL. &lt;U&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCs2DCAS" target="_blank"&gt;https://community.talend.com/t5/Design-and-Development/tPostgresqlInput-Query-Slow-Through-Talend-Yet-Fast-When-Running/td-p/54319&lt;/A&gt;&lt;/U&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;In the meantime, I’ve played with some numbers and fetched 1M records still with good performance of 232,612 rows/sec. , 10M records and 240,028 rows/sec, 20M records and 76377 rows/sec, 30M records and 20065 rows/sec. More than 31M records seemed to freeze the job.&lt;/P&gt; 
&lt;P&gt;Anyone knows what’s going on?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jun 2020 18:59:21 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2020-06-03T18:59:21Z</dc:date>
    <item>
      <title>tPostgresQLInput performance scaling issue</title>
      <link>https://community.qlik.com/t5/Talend-Studio/tPostgresQLInput-performance-scaling-issue/m-p/2316066#M86648</link>
      <description>&lt;P&gt;I’m still new to TOS and figuring out stuff as I go along. Below is the next issue I have.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I initially had a job that involved a PostgreSQLInput component that extracted 5 columns from one table, a tmap in which I executed some data type conversions, created some date fields and loaded the data back into PostgreSQL. That went reasonably fast for a source table with less than a 100000 records or so. Then I received a source table with 32 million records and the performance went down to a grind halt and 70 rows/sec or so after I even increased the JVM arguments to XMs2560M and XMx7000M on my laptop.&lt;/P&gt; 
&lt;P&gt;I figured it must have been the tmap component in which I performed perhaps too many conversions and so I split it out and used a dedicated tconvertype. The performance didn’t increase either.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Then I decided to create a test job (attached screenshot) and reduced the complexity to only the PostgreSQLInput component with a simple select of just 5 columns and a tfileoutput delimited component into a flatfile. For the 32 million dataset, it would takes ages for the job to even start processing and eventually give a java memory heap error.&lt;/P&gt; 
&lt;P&gt;I then recreated a similar job in SSIS and SSIS was able to load all 32 million records in 3 minutes and 42 secs with an avg. speed of 144,144 rows/sec.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I also searched the forums here and I came across another user with similar performance issues for PostgresQL. &lt;U&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCs2DCAS" target="_blank"&gt;https://community.talend.com/t5/Design-and-Development/tPostgresqlInput-Query-Slow-Through-Talend-Yet-Fast-When-Running/td-p/54319&lt;/A&gt;&lt;/U&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;In the meantime, I’ve played with some numbers and fetched 1M records still with good performance of 232,612 rows/sec. , 10M records and 240,028 rows/sec, 20M records and 76377 rows/sec, 30M records and 20065 rows/sec. More than 31M records seemed to freeze the job.&lt;/P&gt; 
&lt;P&gt;Anyone knows what’s going on?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jun 2020 18:59:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/tPostgresQLInput-performance-scaling-issue/m-p/2316066#M86648</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-06-03T18:59:21Z</dc:date>
    </item>
  </channel>
</rss>

