Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More

Qlik Talend Big Data: Talend version 8 Spark Batch Job execution generates huge size of debug log on CDP 7.x

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
wei_guo
Support
Support

Qlik Talend Big Data: Talend version 8 Spark Batch Job execution generates huge size of debug log on CDP 7.x

Last Update:

Oct 8, 2024 2:36:33 AM

Updated By:

Shicong_Hong

Created date:

Oct 8, 2024 2:38:29 AM

After migrating a Spark Job from version 731 to version 801, the migrated Spark task execution generated an application log with a DEBUG level log. For some large Spark task executions, this generated up to 10GB of logs. The Spark Job design showed that the log4jLevel was unchecked by default. 

 

Cause

The log configuration for both the spark.driver and spark.executor is not set by default, resulting in the Spark batch Job executing with DEBUG level by default.

 

Resolution

In Run -> Spark Configuration ->Advanced properties (or in the wizard if using repository)

Add the property "spark.driver.extraJavaOptions" with value "-Dlog4j.configuration=/etc/spark/conf.cloudera.spark_on_yarn/log4j.properties"

Add the property "spark.executor.extraJavaOptions" with value "-Dlog4j.configuration=/etc/spark/conf.cloudera.spark_on_yarn/log4j.properties"

Note: /etc/spark/conf.cloudera.spark_on_yarn/log4j.properties is the default value provided on CDP, and you have the flexibility to customize the log levels as per your preference. This will result in altering the logger value when executed on Yarn.

 

Environment

Contributors
Version history
Last update:
9 hours ago
Updated by: