Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
EChin1692155697
Contributor
Contributor

Talend Spark Job having error when run to Spark in Hadoop Cluster

Current Envinroment:

Talend Version: 8.0.1

Build ID: R2023-07

Hadoop Version 2.8.0

Spark Version: 3.1

Talend Studio resides in one EC2 instance while the Spark resides in a single node Hadoop cluster in another EC2 instance. Currently the connectivity between the two instances/ server are working as it should be (all required ports in FW and UFW are allowed). But when trying to run the Spark job to access the HDFS folder but am facing some errors regarding noclasdeffounderror.

Setup HDFS connection and tested the connection is fine in normal Talend Job.

Setup the same HDFS connection in Talend Spark Job but having error after running the job.

This error is shown on Hadoop logs:

Exception in thread "main" java.lang.NoClassDefFoundError: routines/system/api/TalendJob

   at java.lang.ClassLoader.defineClass1(Native Method)

   at java.lang.ClassLoader.defineClass(ClassLoader.java:756)

   at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

   at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)

   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)

   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)

   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)

   at java.security.AccessController.doPrivileged(Native Method)

   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)

   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)

   at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46)

   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

   at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:722)

   at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:496)

   at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)

   at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)

   at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)

   at java.security.AccessController.doPrivileged(Native Method)

   at javax.security.auth.Subject.doAs(Subject.java:422)

   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)

   at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)

   at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

Caused by: java.lang.ClassNotFoundException: routines.system.api.TalendJob

   at java.lang.ClassLoader.findClass(ClassLoader.java:523)

   at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)

   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)

   at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)

   at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)

   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

   ... 22 more

This errors is shown on the Talend Spark Job:

[ERROR] 07:50:33 org.apache.spark.deploy.yarn.Client- Application diagnostics message: Application application_ID failed 2 times due to AM Container for appattempt_ID exited with exitCode: 1

Failing this attempt.Diagnostics: Exception from container-launch.

Container id: container_ID

Exit code: 1

Stack trace: ExitCodeException exitCode=1: 

at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)

at org.apache.hadoop.util.Shell.run(Shell.java:869)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:750)

Container exited with a non-zero exit code 1

For more detailed output, check the application tracking page: http://ip-x.x.x.x:xxxx/cluster/app/application_ID Then click on links to logs of each attempt.

. Failing the application.

[ERROR] 07:50:33 <job_name>_0_1.<job_name>- TalendJob: '<job_name>' - Failed with exit code: 1.

org.apache.spark.SparkException: Application application_ID_0004 finished with failed status

at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)

at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1636)

at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

at <project_name>.<job_name>_0_1.<job_name>.runClientJob(<job_name>.java:1353)

at <project_name>.<job_name>_0_1.<job_name>.runJobInTOS(<job_name>.java:987)

at <project_name>.<job_name>_0_1.<job_name>.main(<job_name>.java:857)

Appreciate if someone can advise as I am not able to see what i am missing since most of the required jar files are added to the job.

Labels (4)
0 Replies