topic Re: Spark Big Data job in local mode - Configuring external hive metastore in Talend Studio

Spark Big Data job in local mode - Configuring external hive metastore

csapparapu — Sat, 16 Nov 2024 04:11:30 GMT

Hello,

My goal is to run a Spark Big Data batch job using Talend in local mode, no third party clusters or distributions.

I want to save the output to S3, but before that I want to register that data as an external table in a hive metastore.

I would like to use an external hive metastore database. I was able to connect to an external mysql database as the metastore from my spark-shell.

I am having trouble on how to set spark.sql.hive.metastore.jars properties in the Run tab's Spark configuration. I couldn't find any information in the documentation.

Thanks for looking into this.

Chandana

Re: Spark Big Data job in local mode - Configuring external hive metastore

manodwhb — Mon, 04 Nov 2019 08:59:56 GMT

@csapparapu, You can define below way in

Define the advanced settings

Define Spark advanced settings in the Studio to read Spark 2.0 jar files in your cluster.

Procedure

In the Advanced properties table, to add a row, click the plus symbol (+).
In the Property column, in double quotation marks, enter spark.sql.hive.metastore.jars. This parameter provides the names of jar files to be used by your Spark Job, as well as the paths to them in your cluster.

Re: Spark Big Data job in local mode - Configuring external hive metastore

csapparapu — Fri, 08 Nov 2019 22:29:27 GMT

@manodwhb,

Thanks for your reply, my question was which jar files to include. I have tried several jar files, as shown below, but still cannot run spark-sql in local mode.

Do you have an example of a spark batch job in local mode with an external hive metastore?

Are these jar files comma separated?

"file:///Users/abc/.m2/repository/mysql/mysql-connector-java/8.0.18/mysql-connector-java-8.0.18.jar;file:///Applications/TalendStudio-7.2.1/studio/configuration/.m2/repository/org/talend/libraries/hadoop-common-2.8.1/6.0.0/hadoop-common-2.8.1-6.0.0.jar;file:///Applications/TalendStudio-7.2.1/studio/configuration/.m2/repository/org/talend/libraries/spark-hive_2.11-2.2.0/6.0.0/spark-hive_2.11-2.2.0-6.0.0.jar;/Applications/TalendStudio-7.2.1/studio/configuration/.m2/repository/org/talend/libraries/hadoop-hdfs-2.6.0.2.2.0.0-2041/6.0.0/hadoop-hdfs-2.6.0.2.2.0.0-2041-6.0.0.jar;file:///Applications/TalendStudio-7.2.1/studio/configuration/.m2/repository/org/talend/libraries/hive-exec-2.1.0-talend-nolang3/6.0.0/hive-exec-2.1.0-talend-nolang3-6.0.0.jar;file:///Applications/TalendStudio-7.2.1/studio/configuration/.m2/repository/org/talend/libraries/hive-jdbc-2.1.0-amzn-0/6.0.0/hive-jdbc-2.1.0-amzn-0-6.0.0.jar"