<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Custom Hadoop Distribution support to Spark components in Talend in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Custom-Hadoop-Distribution-support-to-Spark-components-in-Talend/m-p/2370692#M133702</link>
    <description>&lt;P&gt;I am working with a cluster where we have custom hadoop 2.4. I am trying to use talend with spark components. For the &lt;A href="http://sparkconnect.com.au/" target="_self" rel="nofollow noopener noreferrer"&gt;Spark Connection&lt;/A&gt; components, I have the set the relevant SparkHost, SparkHome.&lt;/P&gt; 
&lt;P&gt;For the distribution, the two available options are Cloudera and Custom( unsupported). When the Custom( unsupported ) distribution is selected, there is a provision to choose the custom Hadoop version to include the relavant libraries. The options available here are: Cloudera, HortonWorks, MapR, Apache, Amazon EMR, PivotalHD. However for me, when I choose Cloudera it comes with Hadoop 2.3 and I am assuming that all essential libraries are missing, and hence I get an "NoClassDefFoundError" which leads to not being able to load a file in Spark via this Spark connection. Btw, the spark version I have is 1.0.0&lt;/P&gt; 
&lt;P&gt;I would like to know how to fix this and a way to get this version of Spark running with &lt;A href="https://goo.gl/Xstr4D" target="_blank" rel="nofollow noopener noreferrer"&gt;Hadoop Certification&lt;/A&gt;.&lt;/P&gt; 
&lt;P&gt;The error is copied and pasted below:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;[statistics] connecting to socket on port 3637

[statistics] connected

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/JavaSparkContext

    at sparktest.sparktest_0_1.sparktest.tSparkConnection_2Process(sparktest.java:491)

    at sparktest.sparktest_0_1.sparktest.runJobInTOS(sparktest.java:1643)

    at sparktest.sparktest_0_1.sparktest.main(sparktest.java:1502)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.JavaSparkContext

    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

    ... 3 more

[statistics] disconnected

Job sparktest ended at 13:19 21/10/2014. [exit code=1]&lt;/PRE&gt; 
&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 08:24:13 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T08:24:13Z</dc:date>
    <item>
      <title>Custom Hadoop Distribution support to Spark components in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Custom-Hadoop-Distribution-support-to-Spark-components-in-Talend/m-p/2370692#M133702</link>
      <description>&lt;P&gt;I am working with a cluster where we have custom hadoop 2.4. I am trying to use talend with spark components. For the &lt;A href="http://sparkconnect.com.au/" target="_self" rel="nofollow noopener noreferrer"&gt;Spark Connection&lt;/A&gt; components, I have the set the relevant SparkHost, SparkHome.&lt;/P&gt; 
&lt;P&gt;For the distribution, the two available options are Cloudera and Custom( unsupported). When the Custom( unsupported ) distribution is selected, there is a provision to choose the custom Hadoop version to include the relavant libraries. The options available here are: Cloudera, HortonWorks, MapR, Apache, Amazon EMR, PivotalHD. However for me, when I choose Cloudera it comes with Hadoop 2.3 and I am assuming that all essential libraries are missing, and hence I get an "NoClassDefFoundError" which leads to not being able to load a file in Spark via this Spark connection. Btw, the spark version I have is 1.0.0&lt;/P&gt; 
&lt;P&gt;I would like to know how to fix this and a way to get this version of Spark running with &lt;A href="https://goo.gl/Xstr4D" target="_blank" rel="nofollow noopener noreferrer"&gt;Hadoop Certification&lt;/A&gt;.&lt;/P&gt; 
&lt;P&gt;The error is copied and pasted below:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;[statistics] connecting to socket on port 3637

[statistics] connected

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/JavaSparkContext

    at sparktest.sparktest_0_1.sparktest.tSparkConnection_2Process(sparktest.java:491)

    at sparktest.sparktest_0_1.sparktest.runJobInTOS(sparktest.java:1643)

    at sparktest.sparktest_0_1.sparktest.main(sparktest.java:1502)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.JavaSparkContext

    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

    ... 3 more

[statistics] disconnected

Job sparktest ended at 13:19 21/10/2014. [exit code=1]&lt;/PRE&gt; 
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 08:24:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Custom-Hadoop-Distribution-support-to-Spark-components-in-Talend/m-p/2370692#M133702</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T08:24:13Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Hadoop Distribution support to Spark components in Talend</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Custom-Hadoop-Distribution-support-to-Spark-components-in-Talend/m-p/2370693#M133703</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt; 
&lt;P&gt;Could you please indicate on which talend build version you got this issue? Here exists a jira issue:&lt;A title="https://jira.talendforge.org/browse/TBD-3774" href="https://jira.talendforge.org/browse/TBD-3774" target="_self" rel="nofollow noopener noreferrer"&gt;https://jira.talendforge.org/browse/TBD-3774&lt;/A&gt; about "spark job can't work with HDP2.3".&lt;/P&gt; 
&lt;P&gt;This issue has been fixed on &lt;SPAN class="value"&gt; &lt;SPAN class="shorten" style="height: auto;"&gt; &lt;A title="6.1.2 " href="https://jira.talendforge.org/issues/?jql=project+%3D+TBD+AND+fixVersion+%3D+6.1.2" target="_blank" rel="nofollow noopener noreferrer"&gt;6.1.2&lt;/A&gt;, &lt;A title="6.2.1 " href="https://jira.talendforge.org/issues/?jql=project+%3D+TBD+AND+fixVersion+%3D+6.2.1" target="_blank" rel="nofollow noopener noreferrer"&gt;6.2.1&lt;/A&gt; &lt;/SPAN&gt; &lt;/SPAN&gt;.&lt;/P&gt; 
&lt;P&gt;Best regards&lt;/P&gt; 
&lt;P&gt;Sabrina&lt;/P&gt;</description>
      <pubDate>Thu, 19 Apr 2018 08:48:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Custom-Hadoop-Distribution-support-to-Spark-components-in-Talend/m-p/2370693#M133703</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-04-19T08:48:57Z</dc:date>
    </item>
  </channel>
</rss>

