Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
vintac
Partner - Contributor III
Partner - Contributor III

How to add packages to Spark

Hello, i'm trying to setup a big data job and i need to add some packages to spark. I'm trying to create an Iceberg data lakehouse on S3 tables. Exactly i need to define the equivalent for the following:

spark-shell \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime:0.1.4 \
--conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog \
--conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

I've tried to configure it in the spark configuration tab adding the property spark.jar.packages  and i've added the respective modules in talend, but when i run the job those packages are missing anyway.
What's wrong? Is there something more or different i shall do?

Indeed a request, why don't you provide full support to iceberg either in normal jobs?

Labels (2)
1 Reply
asin_artha
Partner - Contributor II
Partner - Contributor II

Just to make sure first that there isn't a typo, the property in the Spark configuration should be 'spark.jars.packages', where the jars is specified as plural.

If the JARS are not available to be installed as platform packages from Maven and instead need to be externally installed, it is always a good idea to have a folder available in your Talend workspace at the time of running the Job so Studio can locate the path to the JARS downloaded.

If the above don't work, you can also:

  1. In the Basic settings of your tSparkSubmit (or equivalent) component, look for a field called Extra main classpath or Additional Java Classpath.
  2. Manually add the full path to the downloaded JAR files here. To get the JARs, you can use Maven to download them.