How to connect Hadoop Hive

    This is an example to show how you can connect QlikView with Hadoop Hive by using the JDBC Connector:



    First make all settings for the JDBC Connector as described.


    Then you can download a Cloudera demo VM:


    After starting the CentOS VM with Hadoop you can create the Beeswax for Hive examples (via web app Hue), which are two tables:


    Then start the Hive service which is running on default port 10000:


    /usr/bin/hive --service hiveserver


    Don't forget to find out the IP address of your VM (call ifconfig).


    Next steps are on the client side. Extract attached file to a folder.


    This file is a special collection we have made for this purpose. Also, we had to include a file META-INF/services/java.sql.Driver with the driver name org.apache.hadoop.hive.jdbc.HiveDriver into the library hive-jdbc-0.7.1.jar.


    Now, add all Java libraries with the full path to the CLASSPATH variable:













    Now connect to the Hive instance in QlikView and select your table:


    CUSTOM CONNECT TO "Provider=JDBCConnector_x64.dll;jdbc:hive://;XUserId=KVPKRRRNPLdIWSJOBDTA;XPassword=EdZQQRRNPLdIWSJOBTYA;";



    See the result in the attached QVW file.


    Don't hesitate to ask if you have troubles with this tutorial. Any notes or comments are welcome.


    - Ralf


    Update: I collected all needed jar files for Hive 0.8.1 which is distributed with the new Cloudera distribution CDH4. You can use this setup:


    Update 2: I collected all needed jar files for Hive 0.9 (see attachments). There is still an UTF-8 issue in Hive 0.9. Ask me for a fix.


    Update 3: We work hard to figure out if and how we can use JDBC with Cloudera Impala. Maybe something will come up soon.


    Update 4: Now we have a first version of a Cloudera Impala JDBC driver (3 month before Cloudera itself will release one) which is using Beeswax API. Just contact me I you need a trial version. Impala has an amazing response time!