Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Current Envinroment:
Talend Version: 8.0.1
Build ID: R2023-07
Hadoop Version 2.8.0
Spark Version: 3.1
Talend Studio resides in one EC2 instance while the Spark resides in a single node Hadoop cluster in another EC2 instance. Currently the connectivity between the two instances/ server are working as it should be (all required ports in FW and UFW are allowed). But when trying to run the Spark job to access the HDFS folder but am facing some errors regarding noclasdeffounderror.
Setup HDFS connection and tested the connection is fine in normal Talend Job.
Setup the same HDFS connection in Talend Spark Job but having error after running the job.
This error is shown on Hadoop logs:
Exception in thread "main" java.lang.NoClassDefFoundError: routines/system/api/TalendJob
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:722)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:496)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: routines.system.api.TalendJob
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 22 more
This errors is shown on the Talend Spark Job:
[ERROR] 07:50:33 org.apache.spark.deploy.yarn.Client- Application diagnostics message: Application application_ID failed 2 times due to AM Container for appattempt_ID exited with exitCode: 1
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_ID
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Container exited with a non-zero exit code 1
For more detailed output, check the application tracking page: http://ip-x.x.x.x:xxxx/cluster/app/application_ID Then click on links to logs of each attempt.
. Failing the application.
[ERROR] 07:50:33 <job_name>_0_1.<job_name>- TalendJob: '<job_name>' - Failed with exit code: 1.
org.apache.spark.SparkException: Application application_ID_0004 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1636)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at <project_name>.<job_name>_0_1.<job_name>.runClientJob(<job_name>.java:1353)
at <project_name>.<job_name>_0_1.<job_name>.runJobInTOS(<job_name>.java:987)
at <project_name>.<job_name>_0_1.<job_name>.main(<job_name>.java:857)
Appreciate if someone can advise as I am not able to see what i am missing since most of the required jar files are added to the job.