<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Big files (tFileInputPositional) in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304209#M76003</link>
    <description>Are you sure it makes a difference to split it into two different jobs? Finally the second job has also the task to process 4 mio. rows transmitted from the job before.</description>
    <pubDate>Thu, 25 Sep 2014 13:06:36 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2014-09-25T13:06:36Z</dc:date>
    <item>
      <title>Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304205#M75999</link>
      <description>Dear Talend Support Team, 
&lt;BR /&gt;We have a huge input file with more than 4 mio. rows in it. This file is read by tFileInputPositional and afterwards its data flow is linked 
&lt;BR /&gt;to tMap. There are in addition lookups with database tables but theses tables don't contain many rows. The problem is the 
&lt;BR /&gt;enormous memory consumption. We need a way to keep the memory moderately. Is there a way to read the huge input file in parts and 
&lt;BR /&gt;than process it and after all read the rest? 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MAvP.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/147579i95BAF98EAD62A539/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MAvP.jpg" alt="0683p000009MAvP.jpg" /&gt;&lt;/span&gt; 
&lt;BR /&gt;Kind regards, 
&lt;BR /&gt;Hilderich</description>
      <pubDate>Thu, 25 Sep 2014 11:23:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304205#M75999</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T11:23:37Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304206#M76000</link>
      <description>Hi Hilderich,
&lt;BR /&gt;In order to solve memory problem, in tMap you can save the records in file system. Any ways when tfileinput component reads the file, it can't read all the rows at a time. It reads in chunks of records and then goes to tMap. Your tMap component is the one who collects all the records in memory/file system, works on join operation and pass it to next component after processing. Storing intermediate records in file system will help you to solve the memory problem.
&lt;BR /&gt;This option is available in property setting in the input section of tMap (top third icon from left at input side)
&lt;BR /&gt;Thanks
&lt;BR /&gt;Vaibhav</description>
      <pubDate>Thu, 25 Sep 2014 11:39:25 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304206#M76000</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T11:39:25Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304207#M76001</link>
      <description>Hello Vaibhav, 
&lt;BR /&gt;Thanks for your answer. I forgot to mention that this option (store temp data to file) is already in use. Unfortunately the memory consumption has not improved. 
&lt;BR /&gt;When the job is in process I can observe the temp files written to disk but the consumption is still on its maximum. The problem might be the last tMap component before 
&lt;BR /&gt;the data are stored into the database. But on this final tMap there is no lookup designed and therefore I cannot save the flow temporarily to disk again. Any other ideas? 
&lt;BR /&gt;Kind regards, 
&lt;BR /&gt;Hilderich</description>
      <pubDate>Thu, 25 Sep 2014 12:01:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304207#M76001</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T12:01:00Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304208#M76002</link>
      <description>Hi, 
&lt;BR /&gt;you can try disabling part of job which will help you to understand which component or section of job is consuming memory..or also you can try to break one job into small subjobs and pass data from parent to child or use files in between processing... Performing all tasks in single job is not optimized way to deal with large amount of data and joins... even you can distribute join processing in stages if possible. 
&lt;BR /&gt;Vaibhav</description>
      <pubDate>Thu, 25 Sep 2014 12:52:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304208#M76002</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T12:52:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304209#M76003</link>
      <description>Are you sure it makes a difference to split it into two different jobs? Finally the second job has also the task to process 4 mio. rows transmitted from the job before.</description>
      <pubDate>Thu, 25 Sep 2014 13:06:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304209#M76003</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T13:06:36Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304210#M76004</link>
      <description>The bottleneck is component tDenormalize. Without this there is no memory consumption up to its limit. Any suggestions how to replace it by a more efficiently approach?. 
&lt;BR /&gt;btw: Your image attachment function here is defect - I cannot attach any images anymore.</description>
      <pubDate>Thu, 25 Sep 2014 14:56:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304210#M76004</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T14:56:04Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304211#M76005</link>
      <description>Yes, what you are trying to do with tDenormalize?</description>
      <pubDate>Thu, 25 Sep 2014 15:07:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304211#M76005</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T15:07:34Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304212#M76006</link>
      <description>We need to group the data structure but we skip field "LKZ" from grouping. By this we get the values for "LKZ" comma separated and that is what we want. &lt;BR /&gt;This all can be done and is realized already by tDenormalize in the job above.</description>
      <pubDate>Thu, 25 Sep 2014 15:15:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304212#M76006</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T15:15:13Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304213#M76007</link>
      <description>--- just an idea...
&lt;BR /&gt;you can put a tfilterrow component before tdenormalize and distribute rows based on particular key value which does not oppose the grouping functionality required by tdenormalize... then you can have two tdenormalize component in main and reject flow... there by dividing the memory usage onto two components... also can use sort component before tdenormalize to give him sorted data so as to process quickly...</description>
      <pubDate>Thu, 25 Sep 2014 15:21:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304213#M76007</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T15:21:01Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304214#M76008</link>
      <description>Thank you for your help and your suggestions. As far as I know tSortRow is also a memory killer. I could imagine tSortRow in combination with tDenormalize would blow up the memory. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Thu, 25 Sep 2014 15:45:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304214#M76008</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T15:45:14Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304215#M76009</link>
      <description>tSortRow can sort on disk under advanced settings. You can then use the tAggregateSortedRow and the list function to denormalize it and reduce the memory consumption.</description>
      <pubDate>Thu, 25 Sep 2014 16:05:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304215#M76009</guid>
      <dc:creator>rbaldwin</dc:creator>
      <dc:date>2014-09-25T16:05:55Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304216#M76010</link>
      <description>Hello rbaldwin,
&lt;BR /&gt;That sounds good. I am going to try it tomorrow and give you feedback right here. I am going home now.
&lt;BR /&gt;Kind regards,
&lt;BR /&gt;Hilderich</description>
      <pubDate>Thu, 25 Sep 2014 16:18:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304216#M76010</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-09-25T16:18:40Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304217#M76011</link>
      <description>Hi &lt;A href="http://www.talendforge.org/forum/profile.php?id=142236" target="_blank" rel="nofollow noopener noreferrer"&gt;hilderich&lt;/A&gt;,&lt;BR /&gt;&lt;BR /&gt;Is there any feedback for your issue&lt;B&gt;?&lt;/B&gt;&lt;BR /&gt;Best regards&lt;BR /&gt;Sabrina</description>
      <pubDate>Mon, 27 Oct 2014 02:49:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304217#M76011</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-10-27T02:49:51Z</dc:date>
    </item>
    <item>
      <title>Re: Big files (tFileInputPositional)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304218#M76012</link>
      <description>This approach was helpful and it is in use.</description>
      <pubDate>Wed, 28 Jan 2015 11:55:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Big-files-tFileInputPositional/m-p/2304218#M76012</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2015-01-28T11:55:31Z</dc:date>
    </item>
  </channel>
</rss>

