<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SQL in a tflowtoiterate performance degrading in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245668#M31446</link>
    <description>Here is the asked screen shots&lt;BR /&gt;The red arrow represent the process that is degrading fast&lt;BR /&gt;The black arrow lead to the (a), (b), (c) portiojn wich is irrelevant since it's very fast&lt;BR /&gt;The green arrow is a OnSubJobOk that will finalize the etl</description>
    <pubDate>Fri, 30 Nov 2012 16:02:10 GMT</pubDate>
    <dc:creator>manueld</dc:creator>
    <dc:date>2012-11-30T16:02:10Z</dc:date>
    <item>
      <title>SQL in a tflowtoiterate performance degrading</title>
      <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245666#M31444</link>
      <description>Hello all, 
&lt;BR /&gt;Using Open Studio TOS_DI-Win32-r78327-V5.0.2 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;Context of the source 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;I'm stuck with a poor designed postgresql 8.3 table 
&lt;BR /&gt;- 14 millions rows covering only the last 3 months 
&lt;BR /&gt;- typically betwwen 150k et 200k per business day, less on the weekend 
&lt;BR /&gt;- 55 columns as well as 23 index. 
&lt;BR /&gt;- vaccum is not well tuned (if tuned at all) 
&lt;BR /&gt;- every day data are removed and added to keep only 3 month 
&lt;BR /&gt;- not index rebuild with enormous holes in the beginning of the table. 
&lt;BR /&gt;Eventually all those bad desgins will be handeled. Until then, I need to work with what I have 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;The purpose of the ETL 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;I have to evaluate the integrity on a daily basis but sometimes, past rows are being 
&lt;BR /&gt;updated. Not much but at least few percent. Also, for the initial release of my sanity check, 
&lt;BR /&gt;i'll have to run the damn thing for 90 days. After that, I'll have to go back 10 days every day. 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;Solution so far 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;I tried to grab the whole thing in single tinput... no luck there... so 
&lt;BR /&gt;tinput ( generate dates from / to ) 
&lt;BR /&gt; --&amp;gt; tFlowToIterate 
&lt;BR /&gt; --&amp;gt; tInput ( with dynamic where clause on indexed timestamp - tried with buffer to 10k or 20k) 
&lt;BR /&gt; --&amp;gt; tOuput ( with a drop table if exist in a tmp schema - tried with buffer to 10k or 20k) 
&lt;BR /&gt; --&amp;gt; several other temp table with aggregated content few final small thashoutput 
&lt;BR /&gt;I dont use an existing connxexion but instead, use context 
&lt;BR /&gt;variables for the all postgresql component 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;results 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;When the date range is only 1 day of processing, I can reach 3500 / 4000 rows per seconds 
&lt;BR /&gt;witch is fine, dealing with a minute per day. That's my base line 
&lt;BR /&gt;The problem is that when I enlarge the date range (nb of flowtoIterate up) the perf is degrading fast. 
&lt;BR /&gt;For 2 days, the initial one is same but I drop to 1000 rows per sec on the second 
&lt;BR /&gt;Last test on 6 days, the last one cant reach 500 rows per sec. 
&lt;BR /&gt;I did not even tried de needed 10 days 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;now what ! 
&lt;BR /&gt;---------------------- 
&lt;BR /&gt;I dont know what to do to improve the Extract perf. 
&lt;BR /&gt;Should I call a complete subjob after the flowtoiterate instate of keeping all in the etl 
&lt;BR /&gt;There surely a buffer not emptied righ? 
&lt;BR /&gt;Any help would be appreciated!! 
&lt;BR /&gt;Manuel</description>
      <pubDate>Fri, 30 Nov 2012 03:08:12 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245666#M31444</guid>
      <dc:creator>manueld</dc:creator>
      <dc:date>2012-11-30T03:08:12Z</dc:date>
    </item>
    <item>
      <title>Re: SQL in a tflowtoiterate performance degrading</title>
      <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245667#M31445</link>
      <description>Hi &lt;BR /&gt;Can you please upload some screenshots of job? So that we will see what we can do to optimize the job design.</description>
      <pubDate>Fri, 30 Nov 2012 11:16:45 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245667#M31445</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-11-30T11:16:45Z</dc:date>
    </item>
    <item>
      <title>Re: SQL in a tflowtoiterate performance degrading</title>
      <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245668#M31446</link>
      <description>Here is the asked screen shots&lt;BR /&gt;The red arrow represent the process that is degrading fast&lt;BR /&gt;The black arrow lead to the (a), (b), (c) portiojn wich is irrelevant since it's very fast&lt;BR /&gt;The green arrow is a OnSubJobOk that will finalize the etl</description>
      <pubDate>Fri, 30 Nov 2012 16:02:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245668#M31446</guid>
      <dc:creator>manueld</dc:creator>
      <dc:date>2012-11-30T16:02:10Z</dc:date>
    </item>
    <item>
      <title>Re: SQL in a tflowtoiterate performance degrading</title>
      <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245669#M31447</link>
      <description>Idea? Someone?</description>
      <pubDate>Mon, 03 Dec 2012 19:07:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245669#M31447</guid>
      <dc:creator>manueld</dc:creator>
      <dc:date>2012-12-03T19:07:28Z</dc:date>
    </item>
    <item>
      <title>Re: SQL in a tflowtoiterate performance degrading</title>
      <link>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245670#M31448</link>
      <description>Hi 
&lt;BR /&gt;I have the following points to improve the performance: 
&lt;BR /&gt;1. Enable parallel execution on iterate link, click the iterate link between tFlowToIterate and tJava component, check the box 'Enable parallel execution' and set the nb of parallel execution. 
&lt;BR /&gt;2. Why do you use iterate link after tjava component? Do you really need iterate? use OnComponentOK replace iterate. 
&lt;BR /&gt;3. Output the result to temporary file instead of memory for large of data set. 
&lt;BR /&gt;Shong</description>
      <pubDate>Tue, 04 Dec 2012 04:54:45 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/SQL-in-a-tflowtoiterate-performance-degrading/m-p/2245670#M31448</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-04T04:54:45Z</dc:date>
    </item>
  </channel>
</rss>

