Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have been facing severe issues with connecting to Cloudera Cluster from Talend Big Data Spark Job. We have been getting this error:
Our job is being submitted to spark, but we are wondering if we are missing any spark configuration parameters missing from Talend end.
Talend version using: 6.3.1
Cloudera Version: 5.12
Any suggestions would be of great help.
Thank you
Starting job test_spark at 01:42 24/08/2017.
[statistics] connecting to socket on port 3728
[statistics] connected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Talend/6.3.1/Talend-Studio-20161216_1026-V6.3.1/Talend-Studio-20161216_1026-V6.3.1/workspace/.Java/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Talend/6.3.1/Talend-Studio-20161216_1026-V6.3.1/Talend-Studio-20161216_1026-V6.3.1/workspace/.Java/lib/talend-spark-assembly-1.6.0-cdh5.8.1-hadoop2.6.0-cdh5.8.1-with-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[WARN ]: org.apache.spark.SparkConf - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at big_data.test_spark_0_1.test_spark.runJobInTOS(test_spark.java:1487)
at big_data.test_spark_0_1.test_spark.main(test_spark.java:1374)
[WARN ]: org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at big_data.test_spark_0_1.test_spark.runJobInTOS(test_spark.java:1487)
at big_data.test_spark_0_1.test_spark.main(test_spark.java:1374)
Exception in thread "main" java.lang.RuntimeException: TalendJob: 'test_spark' - Failed with exit code: 1.
at big_data.test_spark_0_1.test_spark.main(test_spark.java:1384)
[ERROR]: big_data.test_spark_0_1.test_spark - TalendJob: 'test_spark' - Failed with exit code: 1.
Hi xdshi ,
Yes , it is a spark batch job.
My Cluster is configured correctly . Here I have attached screen shot for reference .
I am able to run map-reduce job using this cluster configuration (I am using cloudera distribution.)
I am facing issue when trying to run Spark batch job.
I have attached screen shots which describe my job, spark configuration, error. Please help me out .
Hello,
It is a spark batch job? Is your cluster correctly configured? Is our connection from the repository? More information will be helpful for to address your issue. Screenshots will be preferred.
Note: Please mask your sensitive data.
Best regards
Sabrina
Hi xdshi ,
Yes , it is a spark batch job.
My Cluster is configured correctly . Here I have attached screen shot for reference .
I am able to run map-reduce job using this cluster configuration (I am using cloudera distribution.)
I am facing issue when trying to run Spark batch job.
I have attached screen shots which describe my job, spark configuration, error. Please help me out .
Hi,
I am suffering from the same problem , can zou please tell me how did zou solve it _
Best Regards
Can you please tell us how did you sove this problem
Hi siddarthaartha,
Could you share your solution to this problem please?
Currently, I am facing the same problem as you have mentioned.
Thanks.