Handling skew data in Spark Batches

J_Ruiz — Fri, 15 Nov 2024 22:20:20 GMT

Hello,

I've come to notice a certain spark batch job normally gets stuck in the 200th task of some stages (199/200) and I've been looking for causes of this, finding articles about data skewness and how to fix it, with spark code examples.

Do apache spark batches have some kind of way to fix these kind of problems? Such as broadcasting tables or salting.

I'm currently using Talend Big Data R2020-09-7.3.1 with CDH 6.3.2 and Spark 2.4.

Any help or take on the matter would be appreciated.

Thanks.

Re: Handling skew data in Spark Batches

Anonymous — Thu, 24 Nov 2022 04:19:17 GMT

Hello,

Are you getting Spark Error java.lang.OutOfMemoryError when executing Talend BigData spark Batch job?

First of all, we'd like to check your job design first to fully investigate this issue.

And Is there any execution log from Talend side(Studio or job server) and any log from Hadoop side?

Best regards

Sabrina

topic Handling skew data in Spark Batches in Talend Studio

Handling skew data in Spark Batches

Re: Handling skew data in Spark Batches