<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spark corrupt remote block broadcast in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Spark-corrupt-remote-block-broadcast/m-p/2200439#M2927</link>
    <description>&lt;P&gt;Let me add some comments.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;OL&gt; 
 &lt;LI&gt;I’&lt;SPAN&gt;m using 10 worker nodes(AWS r5.4xlarge). The instances have 128GB memory and 16 cores each.&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;As I said, there are about 30 tables. Some tables contain 10 MB or so, other tables(about 10) contains over 100GB.&lt;/LI&gt; 
&lt;/OL&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;And, I added spark parameters.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Regards.&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Lumg"&gt;spark properties.xlsx&lt;/A&gt;</description>
    <pubDate>Wed, 27 Feb 2019 09:02:50 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2019-02-27T09:02:50Z</dc:date>
    <item>
      <title>Spark corrupt remote block broadcast</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Spark-corrupt-remote-block-broadcast/m-p/2200438#M2926</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Currently, I’&lt;SPAN&gt;m struggling with some Spark Job.&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;I’&lt;SPAN&gt;m just processing data to de-normalize with about 30 tables in SparkSQL(See the sql).&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="spark job.PNG" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M2pJ.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/142992iEB68796818C67903/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M2pJ.png" alt="0683p000009M2pJ.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;During the job processing, I’&lt;SPAN&gt;m encountering the error below;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;[WARN ]: org.apache.spark.scheduler.TaskSetManager - Lost task 74.0 in stage 26.0 (TID 10248, ip-10-118-121-62.ap-northeast-1.compute.internal, executor 12): java.io.IOException: org.apache.spark.SparkException: corrupt remote block broadcast_116_piece0 of broadcast_116: -461336360 != 2000236512&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1350)&lt;/P&gt; 
&lt;P&gt;・・・&lt;SPAN&gt;(Ommiting)&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;Caused by: org.apache.spark.SparkException: corrupt remote block broadcast_116_piece0 of broadcast_116: -461336360 != 2000236512&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)&lt;/P&gt; 
&lt;P&gt;・・・&lt;SPAN&gt;(Ommiting)&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343)&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ... 35 more&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;As this message shows, some remote block seems to be corrupted by some known reason..&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Can you see the reason for this issue?&lt;/P&gt; 
&lt;P&gt;Here is the properties and full log for this message.&lt;/P&gt; 
&lt;P&gt;Please give me some advice on this issue..&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 06:28:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Spark-corrupt-remote-block-broadcast/m-p/2200438#M2926</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T06:28:58Z</dc:date>
    </item>
    <item>
      <title>Re: Spark corrupt remote block broadcast</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Spark-corrupt-remote-block-broadcast/m-p/2200439#M2927</link>
      <description>&lt;P&gt;Let me add some comments.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;OL&gt; 
 &lt;LI&gt;I’&lt;SPAN&gt;m using 10 worker nodes(AWS r5.4xlarge). The instances have 128GB memory and 16 cores each.&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;As I said, there are about 30 tables. Some tables contain 10 MB or so, other tables(about 10) contains over 100GB.&lt;/LI&gt; 
&lt;/OL&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;And, I added spark parameters.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Regards.&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Lumg"&gt;spark properties.xlsx&lt;/A&gt;</description>
      <pubDate>Wed, 27 Feb 2019 09:02:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Spark-corrupt-remote-block-broadcast/m-p/2200439#M2927</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-02-27T09:02:50Z</dc:date>
    </item>
  </channel>
</rss>

