Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Spark corrupt remote block broadcast

Hi all,

 

Currently, I’m struggling with some Spark Job.

I’m just processing data to de-normalize with about 30 tables in SparkSQL(See the sql).

 

0683p000009M2pJ.png

 

During the job processing, I’m encountering the error below;

[WARN ]: org.apache.spark.scheduler.TaskSetManager - Lost task 74.0 in stage 26.0 (TID 10248, ip-10-118-121-62.ap-northeast-1.compute.internal, executor 12): java.io.IOException: org.apache.spark.SparkException: corrupt remote block broadcast_116_piece0 of broadcast_116: -461336360 != 2000236512

            at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1350)

・・・(Ommiting)

Caused by: org.apache.spark.SparkException: corrupt remote block broadcast_116_piece0 of broadcast_116: -461336360 != 2000236512

            at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)

・・・(Ommiting)

            at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)

            at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343)

            ... 35 more

 

As this message shows, some remote block seems to be corrupted by some known reason..

 

Can you see the reason for this issue?

Here is the properties and full log for this message.

Please give me some advice on this issue..

 

Labels (3)
1 Reply
Anonymous
Not applicable
Author

Let me add some comments.

 

  1. I’m using 10 worker nodes(AWS r5.4xlarge). The instances have 128GB memory and 16 cores each.
  2. As I said, there are about 30 tables. Some tables contain 10 MB or so, other tables(about 10) contains over 100GB.

 

 

And, I added spark parameters.

 

Regards.


spark properties.xlsx