Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Talend Data Preparation lets you make a direct connection to various types of databases. You can use this direct connection as a source to create new datasets. This article explains the procedure to add a new database type to Talend Data Preparation to connect to Hive in Azure HDInsight.
HDInsight is a Hadoop service offering hosted in Azure that enables clusters of managed Hadoop instances. HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on Big Data with high reliability and availability. HDInsight uses the Hortonworks Data Platform (HDP) Hadoop distribution.
Download the following JAR files:
Create a folder named jdbc-drivers in components_catalog_path/.m2/
Copy the JAR files to this folder:
Update the file located at components_catalog_path/config/jdbc_config.json. For this article, the jdbc_config.json file is located as shown in the image below:
Edit the jdbc_config.json file as shown below:
path: List all the required JAR file details. The path follows the pattern mvn:jdbc-drivers/my_database_name/my_version.
Sample path for Hive:
,
{
"id": "Hive HDInsight",
"class": "org.apache.hive.jdbc.HiveDriver",
"url": "jdbc:hive2://YOURCLUSTER.azurehdinsight.net:443/default;transportMode=http;ssl=true;httpPath=/hive2",
"paths": [
{"path": "mvn:jdbc-drivers/hive-jdbc-1.2.1000.2.6.0.3-8/6.4.0"} ,
{"path": "mvn:jdbc-drivers/hive-service-1.2.1000.2.6.0.3-8/6.4.0"},
{"path": "mvn:jdbc-drivers/libthrift/0.9.3"}
]
} Click ADD DATASET and select Database.
You have a new option, Hive HDInsight, available for Database type. Select Hive HDInsight and provide the details below:
Password: The one you use for connecting to the HDInsight cluster
Click TEST CONNECTION.
A new dataset, hive_dataset, is created by connecting to Hive in HDInsight.
Data preparation is a very powerful yet simple-to-use tool for creating datasets and preparations to deliver cleansed, structured, enriched data to business users. This article explained the process to add a database type using an example of Hive in HDInsight while working with Data Preparation on premises.