Handling skew data in Spark Batches

J_Ruiz · ‎2022-11-23

Hello,

I've come to notice a certain spark batch job normally gets stuck in the 200th task of some stages (199/200) and I've been looking for causes of this, finding articles about data skewness and how to fix it, with spark code examples.

Do apache spark batches have some kind of way to fix these kind of problems? Such as broadcasting tables or salting.

I'm currently using Talend Big Data R2020-09-7.3.1 with CDH 6.3.2 and Spark 2.4.

Any help or take on the matter would be appreciated.

Thanks.

Anonymous · ‎2022-11-23

Hello,

Are you getting Spark Error java.lang.OutOfMemoryError when executing Talend BigData spark Batch job?

First of all, we'd like to check your job design first to fully investigate this issue.

And Is there any execution log from Talend side(Studio or job server) and any log from Hadoop side?

Best regards

Sabrina

Big Data

Java

Other

v7.x