Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, i'm trying to setup a big data job and i need to add some packages to spark. I'm trying to create an Iceberg data lakehouse on S3 tables. Exactly i need to define the equivalent for the following:
spark-shell \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime:0.1.4 \
--conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog \
--conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
I've tried to configure it in the spark configuration tab adding the property spark.jar.packages and i've added the respective modules in talend, but when i run the job those packages are missing anyway.
What's wrong? Is there something more or different i shall do?
Indeed a request, why don't you provide full support to iceberg either in normal jobs?
Just to make sure first that there isn't a typo, the property in the Spark configuration should be 'spark.jars.packages', where the jars is specified as plural.
If the JARS are not available to be installed as platform packages from Maven and instead need to be externally installed, it is always a good idea to have a folder available in your Talend workspace at the time of running the Job so Studio can locate the path to the JARS downloaded.
If the above don't work, you can also: