Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
vradhik
Contributor
Contributor

Split a spark streaming job due to Java 65535 bytes limit

Hello

We have a spark streaming Talend job that consumes events in json format from kafka and writes to hive. The input is a large json with 500+ attributes and ending up in 64K byte limit on the method generated for the subjob.

I understand the best way to work around this is to split the subjob but with the streaming job, that is not possible. Are there any suggestions/pointers to work around this?

We have the following flexibility, if any of this helps..

  1. split the single hive table into 2 with a common key so we can join data from 2 tables when needed.
  2. Not necessary to maintain the order of the events when persisting to hive
  3. have the event sent as Avro instead of json (not tried, but should be able to do that)

Thanks

Radhika

Labels (6)
1 Reply
Anonymous
Not applicable

Hi

Take a look at these KB articles about this Java 65535 bytes limit error.

https://community.talend.com/s/article/Exceeding-the-Java-bytes-limit-1Z1UZ

https://community.talend.com/s/article/Building-a-Job-with-one-tExtractPositionalFields-component-fails-with-the-error-The-code-of-method-is-exceeding-the-bytes-limit-17gnl

https://community.talend.com/s/article/tMSSqlInput-Process-Map-String-Object-is-exceeding-the-bytes-limit-InMpE

The workaround is optimizing the Job to reduce the size of the final generated code of a subjob. Try the following:

  • Minimize the number of components in the subjob.
  • Divide the subjob into several subjobs.
  • Reduce the number of columns.

In your case, I think option 1 may be a solution that can be tried.

Regards

Shong