Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Talend Cloud AWS EU Scheduled Outage: Starting Tues 26 May 21:00 CEST with expected completion Wed 27 May 01:00 CEST
cancel
Showing results for 
Search instead for 
Did you mean: 
J_Ruiz
Contributor II
Contributor II

Handling skew data in Spark Batches

Hello,

I've come to notice a certain spark batch job normally gets stuck in the 200th task of some stages (199/200) and I've been looking for causes of this, finding articles about data skewness and how to fix it, with spark code examples.

Do apache spark batches have some kind of way to fix these kind of problems? Such as broadcasting tables or salting.

I'm currently using Talend Big Data R2020-09-7.3.1 with CDH 6.3.2 and Spark 2.4.

Any help or take on the matter would be appreciated.

Thanks.

Labels (4)
1 Reply
Anonymous
Not applicable

Hello,

Are you getting Spark Error java.lang.OutOfMemoryError when executing Talend BigData spark Batch job?

First of all, we'd like to check your job design first to fully investigate this issue.

And Is there any execution log from Talend side(Studio or job server) and any log from Hadoop side?

Best regards

Sabrina