Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
After migrating a Spark Job from version 731 to version 801, the migrated Spark task execution generated an application log with a DEBUG level log. For some large Spark task executions, this generated up to 10GB of logs. The Spark Job design showed that the log4jLevel was unchecked by default.
The log configuration for both the spark.driver and spark.executor is not set by default, resulting in the Spark batch Job executing with DEBUG level by default.
In Run -> Spark Configuration ->Advanced properties (or in the wizard if using repository)
Add the property "spark.driver.extraJavaOptions" with value "-Dlog4j.configuration=/etc/spark/conf.cloudera.spark_on_yarn/log4j.properties"
Add the property "spark.executor.extraJavaOptions" with value "-Dlog4j.configuration=/etc/spark/conf.cloudera.spark_on_yarn/log4j.properties"
Note: /etc/spark/conf.cloudera.spark_on_yarn/log4j.properties is the default value provided on CDP, and you have the flexibility to customize the log levels as per your preference. This will result in altering the logger value when executed on Yarn.