<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Split a spark streaming job due to Java 65535 bytes limit in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Split-a-spark-streaming-job-due-to-Java-65535-bytes-limit/m-p/2347378#M114645</link>
    <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;We have a spark streaming Talend job that consumes events in json format from kafka and writes to hive. The input  is a large json with 500+ attributes and ending up in 64K byte limit on the method generated for the subjob. &lt;/P&gt;&lt;P&gt;I understand the best way to work around this is to split the subjob but with the streaming job, that is not possible. Are there any suggestions/pointers to work around this? &lt;/P&gt;&lt;P&gt;We have the following flexibility, if any of this helps.. &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;split the single hive table into 2 with a common key so we can join data from 2 tables when needed. &lt;/LI&gt;&lt;LI&gt;Not necessary to maintain the order of the events when persisting to hive&lt;/LI&gt;&lt;LI&gt;have the event sent as Avro instead of json (not tried, but should be able to do that)  &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks &lt;/P&gt;&lt;P&gt;Radhika&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2024 21:44:13 GMT</pubDate>
    <dc:creator>vradhik</dc:creator>
    <dc:date>2024-11-15T21:44:13Z</dc:date>
    <item>
      <title>Split a spark streaming job due to Java 65535 bytes limit</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Split-a-spark-streaming-job-due-to-Java-65535-bytes-limit/m-p/2347378#M114645</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;We have a spark streaming Talend job that consumes events in json format from kafka and writes to hive. The input  is a large json with 500+ attributes and ending up in 64K byte limit on the method generated for the subjob. &lt;/P&gt;&lt;P&gt;I understand the best way to work around this is to split the subjob but with the streaming job, that is not possible. Are there any suggestions/pointers to work around this? &lt;/P&gt;&lt;P&gt;We have the following flexibility, if any of this helps.. &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;split the single hive table into 2 with a common key so we can join data from 2 tables when needed. &lt;/LI&gt;&lt;LI&gt;Not necessary to maintain the order of the events when persisting to hive&lt;/LI&gt;&lt;LI&gt;have the event sent as Avro instead of json (not tried, but should be able to do that)  &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks &lt;/P&gt;&lt;P&gt;Radhika&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 21:44:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Split-a-spark-streaming-job-due-to-Java-65535-bytes-limit/m-p/2347378#M114645</guid>
      <dc:creator>vradhik</dc:creator>
      <dc:date>2024-11-15T21:44:13Z</dc:date>
    </item>
    <item>
      <title>Re: Split a spark streaming job due to Java 65535 bytes limit</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Split-a-spark-streaming-job-due-to-Java-65535-bytes-limit/m-p/2347379#M114646</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;Take a look at these KB articles about this Java 65535 bytes limit error.  &lt;/P&gt;&lt;P&gt;https://community.talend.com/s/article/Exceeding-the-Java-bytes-limit-1Z1UZ&lt;/P&gt;&lt;P&gt;https://community.talend.com/s/article/Building-a-Job-with-one-tExtractPositionalFields-component-fails-with-the-error-The-code-of-method-is-exceeding-the-bytes-limit-17gnl&lt;/P&gt;&lt;P&gt;https://community.talend.com/s/article/tMSSqlInput-Process-Map-String-Object-is-exceeding-the-bytes-limit-InMpE&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;The workaround is optimizing the Job to reduce the size of the final generated code of a subjob. Try the following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Minimize the number of components in the subjob.&lt;/LI&gt;&lt;LI&gt;Divide the subjob into several subjobs.&lt;/LI&gt;&lt;LI&gt;Reduce the number of columns.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In your case, I think option 1 may be a solution that can be tried.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Shong&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2023 02:59:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Split-a-spark-streaming-job-due-to-Java-65535-bytes-limit/m-p/2347379#M114646</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-06T02:59:48Z</dc:date>
    </item>
  </channel>
</rss>

