<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can a large table be processed efficiently? in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317618#M88045</link>
    <description>Hi Tom&lt;BR /&gt;1) The bulk db components are used for massive data performance which use temp file to insert and load data.&lt;BR /&gt;    tInfoBrightOutput is a custom component. Its performance may be the same with tMysqlOutput(inefficient).&lt;BR /&gt;2) If the table has a sequence primary key, you can set the query of tMysqlInput.&lt;BR /&gt;    &lt;PRE&gt;        ....  where id &amp;lt; 1000000&lt;/PRE&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Pedro</description>
    <pubDate>Fri, 10 Feb 2012 06:42:43 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2012-02-10T06:42:43Z</dc:date>
    <item>
      <title>How can a large table be processed efficiently?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317617#M88044</link>
      <description>Hi, 
&lt;BR /&gt;I am having a challenge (or two) that someone can hopefully help me out iwth. 
&lt;BR /&gt;I have a job that is reading a table (via a tMysqlInput component) that has approximately 20 million rows and then writes those rows out to another table 
&lt;BR /&gt;that resides in another database. A tInfoBrightOutput component is used for the table being loaded to. 
&lt;BR /&gt;When the job was run, it threw an error due to an out of memory error. The 'Enable stream' option on the tMysqlInput component was then activated to 
&lt;BR /&gt;get around the out of memory error This worked, BUT the job took about 5 hours to load 20 million rows. A similar test using a tMysqlOutputBulkExec 
&lt;BR /&gt;component took 15 minutes to run. 
&lt;BR /&gt;With the above in mind, there are a couple of things I would like to understand. 
&lt;BR /&gt;1) Are there any obvious settings that should be put in place when using the tInfoBrightOutput component? The difference in the time it takes to load to 
&lt;BR /&gt;the same table using different component types is so glaring that I am thinking that I am overlooking something. 
&lt;BR /&gt;2) Is there a way to read the data from the input table (using the tMysqlInput component) in sections at a time? e.g. read and process a millions rows at 
&lt;BR /&gt; a time. I am thinking that if the tables can be read in sections at a time, the data can be held in memory and the 'Enable stream' option will not be 
&lt;BR /&gt; needed. 
&lt;BR /&gt;Also, why use the tInfoBrightOutput component if the tMysqlOutputBuldExec is so much more performant? We want to make use of the InfoBright Knowledge 
&lt;BR /&gt;Grid. 
&lt;BR /&gt;Thank you in advance for any help you can offer on this. 
&lt;BR /&gt;Regards, 
&lt;BR /&gt;Tom</description>
      <pubDate>Sat, 16 Nov 2024 12:23:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317617#M88044</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T12:23:28Z</dc:date>
    </item>
    <item>
      <title>Re: How can a large table be processed efficiently?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317618#M88045</link>
      <description>Hi Tom&lt;BR /&gt;1) The bulk db components are used for massive data performance which use temp file to insert and load data.&lt;BR /&gt;    tInfoBrightOutput is a custom component. Its performance may be the same with tMysqlOutput(inefficient).&lt;BR /&gt;2) If the table has a sequence primary key, you can set the query of tMysqlInput.&lt;BR /&gt;    &lt;PRE&gt;        ....  where id &amp;lt; 1000000&lt;/PRE&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Pedro</description>
      <pubDate>Fri, 10 Feb 2012 06:42:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317618#M88045</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-02-10T06:42:43Z</dc:date>
    </item>
    <item>
      <title>Re: How can a large table be processed efficiently?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317619#M88046</link>
      <description>Hi Pedro,
&lt;BR /&gt;Thank you for yor response. As of now, I am going to try to improve upon the way the job works with the InfoBright Loader component due to
&lt;BR /&gt;us wanting to make use of the InfoBright Knowledge Grid. Unless there is a way to still do that by using the db bulk component.
&lt;BR /&gt;I still have a question though in regards to processing the input table in sections. Do I need to place the read from the table into an iteration or 
&lt;BR /&gt;loop to allow for the process to read 'x' rows at a time? I believe that the example you listed has me reading the first 1 million rows and not the 
&lt;BR /&gt;rest of the table.
&lt;BR /&gt;If the read should be placed into an iteration or a loop could you explain or show how that is to be done (or point me in the right direction)? 
&lt;BR /&gt;Thank you and have a great day.
&lt;BR /&gt;Tom</description>
      <pubDate>Fri, 10 Feb 2012 11:41:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317619#M88046</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-02-10T11:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: How can a large table be processed efficiently?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317620#M88047</link>
      <description>Hi Tom 
&lt;BR /&gt;I think you can use tLoop and read every 2 million(or 1 million) rows for one loop, which will improve performance. 
&lt;BR /&gt;tLoop -&amp;gt; tMysqlInput-&amp;gt;tInfoBrightOutput-&amp;gt;tJava. 
&lt;BR /&gt;Use context variables to set the query in tMysqlInput. 
&lt;BR /&gt;For example. 
&lt;BR /&gt; 
&lt;PRE&gt;"SELECT &lt;BR /&gt;  `TableName`.`id`, &lt;BR /&gt;  `TableName`.`name`&lt;BR /&gt;FROM `TableName`&lt;BR /&gt;where id&amp;gt;="+context.new1+" and id&amp;lt;="+context.new2&lt;/PRE&gt; 
&lt;BR /&gt;Regards, 
&lt;BR /&gt;Pedro</description>
      <pubDate>Mon, 13 Feb 2012 05:31:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317620#M88047</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-02-13T05:31:08Z</dc:date>
    </item>
    <item>
      <title>Re: How can a large table be processed efficiently?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317621#M88048</link>
      <description>Thank you Pedro!</description>
      <pubDate>Mon, 13 Feb 2012 20:05:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-can-a-large-table-be-processed-efficiently/m-p/2317621#M88048</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-02-13T20:05:04Z</dc:date>
    </item>
  </channel>
</rss>

