How to add packages to Spark

vintac · ‎2025-05-20

Hello, i'm trying to setup a big data job and i need to add some packages to spark. I'm trying to create an Iceberg data lakehouse on S3 tables. Exactly i need to define the equivalent for the following:

spark-shell \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime:0.1.4 \
--conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog \
--conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

I've tried to configure it in the spark configuration tab adding the property spark.jar.packages and i've added the respective modules in talend, but when i run the job those packages are missing anyway.
What's wrong? Is there something more or different i shall do?

Indeed a request, why don't you provide full support to iceberg either in normal jobs?

asin_artha · ‎2025-11-30

Just to make sure first that there isn't a typo, the property in the Spark configuration should be 'spark.jars.packages', where the jars is specified as plural.

If the JARS are not available to be installed as platform packages from Maven and instead need to be externally installed, it is always a good idea to have a folder available in your Talend workspace at the time of running the Job so Studio can locate the path to the JARS downloaded.

If the above don't work, you can also:

In the Basic settings of your tSparkSubmit (or equivalent) component, look for a field called Extra main classpath or Additional Java Classpath.
Manually add the full path to the downloaded JAR files here. To get the JARs, you can use Maven to download them.

Big Data

Studio