<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DataPrep of Large File Fails with Read Timeout in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337858#M106154</link>
    <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;We have implemented a job to pull 53m records with 11 columns to pass through to the tRunDataPrep component in a job.  The recipe is pretty simple - nothing more complex than uppercase, removing whitespace, extracting numbers from fields etc.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are using a joblet to run the execution as we have 11 different sources and need a different schema per recipe and so this offers minimal configuration requirements for each - we have looked at Dynamic Schema but this doesn't work for us given the way we are needing to orchestrate, so this is an approach we are comfortable with.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The job runs fine until we get to 6m records and it times out with a socket error.  I've attached the images of the flow and error and would appreciate some thoughts on the resolution.  Following some research into JVM timeout parameters I have just added &lt;/P&gt;&lt;P&gt;-Dws_client_connection_timeout=180000&lt;/P&gt;&lt;P&gt;-Dws_client_receive_timeout=180000 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;now rerunning but would appreciate some thoughts on this issue and how best to resolve it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Attached is the flow plus the error output.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2024 21:59:14 GMT</pubDate>
    <dc:creator>gurn</dc:creator>
    <dc:date>2024-11-15T21:59:14Z</dc:date>
    <item>
      <title>DataPrep of Large File Fails with Read Timeout</title>
      <link>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337858#M106154</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;We have implemented a job to pull 53m records with 11 columns to pass through to the tRunDataPrep component in a job.  The recipe is pretty simple - nothing more complex than uppercase, removing whitespace, extracting numbers from fields etc.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are using a joblet to run the execution as we have 11 different sources and need a different schema per recipe and so this offers minimal configuration requirements for each - we have looked at Dynamic Schema but this doesn't work for us given the way we are needing to orchestrate, so this is an approach we are comfortable with.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The job runs fine until we get to 6m records and it times out with a socket error.  I've attached the images of the flow and error and would appreciate some thoughts on the resolution.  Following some research into JVM timeout parameters I have just added &lt;/P&gt;&lt;P&gt;-Dws_client_connection_timeout=180000&lt;/P&gt;&lt;P&gt;-Dws_client_receive_timeout=180000 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;now rerunning but would appreciate some thoughts on this issue and how best to resolve it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Attached is the flow plus the error output.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 21:59:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337858#M106154</guid>
      <dc:creator>gurn</dc:creator>
      <dc:date>2024-11-15T21:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: DataPrep of Large File Fails with Read Timeout</title>
      <link>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337859#M106155</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;I have directed your question to our developers and hope they can get back to you soon.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Shong&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2023 06:47:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337859#M106155</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-21T06:47:48Z</dc:date>
    </item>
    <item>
      <title>Re: DataPrep of Large File Fails with Read Timeout</title>
      <link>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337860#M106156</link>
      <description>&lt;P&gt;Thanks &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have restructured the job and now extract it into flat files in 2m chunks.  This is then read into the next subjob and into the tRunDataPrep.  All now works fine and executes as expected.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would be good to know if there is a way to not have to chunk the data up to get this through so please do provide an update here when available.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Dave&lt;/P&gt;</description>
      <pubDate>Wed, 22 Mar 2023 17:00:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/DataPrep-of-Large-File-Fails-with-Read-Timeout/m-p/2337860#M106156</guid>
      <dc:creator>gurn</dc:creator>
      <dc:date>2023-03-22T17:00:20Z</dc:date>
    </item>
  </channel>
</rss>

