<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Processing of great data volume in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270873#M48646</link>
    <description>Hello,
&lt;BR /&gt;Thanks for your quick answer.
&lt;BR /&gt;I was aware of this new functionnality on v2.4 but it is only available for lookups. My concern and the point I was underlining in my previous message is on the main output flow: in fact data for lookup flows are stored in memory (in previous TOS versions) but also data from the main flow.
&lt;BR /&gt;Thanks
&lt;BR /&gt;Evagelos</description>
    <pubDate>Thu, 29 May 2008 14:29:32 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2008-05-29T14:29:32Z</dc:date>
    <item>
      <title>Processing of great data volume</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270871#M48644</link>
      <description>Hello, 
&lt;BR /&gt;I am wondering if it would be possible to avoid loading in memory all data from main input flow. An other solution would be to process only a limited number (this could be a parameter in database output components) of data records from the main input at the same time to avoid lack of memory issues. 
&lt;BR /&gt;Let me explain a little : 
&lt;BR /&gt;In case of big amount of data to process (several million records) in ETL flows, Talend needs a server with a lot of memory because all source data records are loaded in the server memory otherwise we get an out of memory error. 
&lt;BR /&gt;In addition, in case data volumes increase, it is not guaranteed that the allowed memory on the server for Talend will be sufficient... which is not really safe for daily enterprise night batch. 
&lt;BR /&gt;In a global point of view, this causes a limitation on the number of records that can be processed by a job because of the server memory limitations. 
&lt;BR /&gt;Is a such enhancement possible in Talend ? 
&lt;BR /&gt;BR 
&lt;BR /&gt;Evagelos</description>
      <pubDate>Sat, 16 Nov 2024 14:21:07 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270871#M48644</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T14:21:07Z</dc:date>
    </item>
    <item>
      <title>Re: Processing of great data volume</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270872#M48645</link>
      <description>Talend provide since TOS 2.4 RC1 the option "Stored on disk" visible on each table lookups into tMap.
&lt;BR /&gt;This option allow you to load into lookup as many rows as you want without memory limit, the only one limit is your disk size for temporary data.
&lt;BR /&gt;Don't forget to set a valid path for temporary files into "Properties view" of tMap.
&lt;BR /&gt;You are welcomed to test this functionality as soon as possible &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Thu, 29 May 2008 13:48:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270872#M48645</guid>
      <dc:creator>amaumont</dc:creator>
      <dc:date>2008-05-29T13:48:01Z</dc:date>
    </item>
    <item>
      <title>Re: Processing of great data volume</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270873#M48646</link>
      <description>Hello,
&lt;BR /&gt;Thanks for your quick answer.
&lt;BR /&gt;I was aware of this new functionnality on v2.4 but it is only available for lookups. My concern and the point I was underlining in my previous message is on the main output flow: in fact data for lookup flows are stored in memory (in previous TOS versions) but also data from the main flow.
&lt;BR /&gt;Thanks
&lt;BR /&gt;Evagelos</description>
      <pubDate>Thu, 29 May 2008 14:29:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270873#M48646</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-05-29T14:29:32Z</dc:date>
    </item>
    <item>
      <title>Re: Processing of great data volume</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270874#M48647</link>
      <description>Hello,
&lt;BR /&gt;In general, main flows are not put in memory. There are only 2 or 3 exceptions with specific components like tSortRow or tAggregateRow.
&lt;BR /&gt;tSortRow already have the Sort on disk option in 2.3 (see advanced settings)
&lt;BR /&gt;tAggregateRow only put aggregate output in memory. In most of cases this is not a strong limitation. We can add the same "Sort on disk" in the component if required.
&lt;BR /&gt;Regards,</description>
      <pubDate>Fri, 30 May 2008 00:48:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270874#M48647</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2008-05-30T00:48:31Z</dc:date>
    </item>
    <item>
      <title>Re: Processing of great data volume</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270875#M48648</link>
      <description>Hi, 
&lt;BR /&gt;sorry that i have to reactivate this thread agains. I think evagelos has right. Even "Store on disk" option can prevent the outOf Memory error it can slows down the process. Is it not better when you just load a limit number of datas in memory and process them. It is faster and will make TOS more scalable.</description>
      <pubDate>Thu, 07 Jan 2010 12:56:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Processing-of-great-data-volume/m-p/2270875#M48648</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2010-01-07T12:56:01Z</dc:date>
    </item>
  </channel>
</rss>

