Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

THiveInput throws an exception:java.io.IOException

Hello,
I am trying to execute a simple HIVE query with a select statement from Talend. The Hive connection succeeds and the job fails on trying to execute this query.
ENVIRONMENT:
Talend Big Data version: 5.2.0 Windows XP
Connecting to Apache 1.0.0 (Hive 0.9.0), Connection embedded
Thanks in advance for you help!
Exception:
java.io.IOException: Cannot run program "null/bin/hadoop" (in directory "C:\Talend\BigData\TOS_BD-r92826-V5.2.0"): CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:267)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191)
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:187)
at lc_d_a.hivepreprod_0_1.HivePreProd.tHiveInput_1Process(HivePreProd.java:702)
at lc_d_a.hivepreprod_0_1.HivePreProd.tHiveConnection_1Process(HivePreProd.java:447)
at lc_d_a.hivepreprod_0_1.HivePreProd.runJobInTOS(HivePreProd.java:1759)
at lc_d_a.hivepreprod_0_1.HivePreProd.main(HivePreProd.java:1624)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
... 15 more
Labels (4)
5 Replies
Anonymous
Not applicable

Hi,
Cannot run program "null/bin/hadoop"

From the error info, please make sure the Environment variables is correct.
Best regards
Sabrina
_AnonymousUser
Specialist III
Specialist III
Author

Thanks for your replay.
Actually, it's not the problem with the environment variables.
It seems the neither THiveConnection nor the THiveInput connects to the remote server, even though I specify the host and port for the remote connection. Instead, it tries to execute a query locally on my Windows workstation.
How can I make Talend know, it needs to connect to a remote HIVE Thrift server...?
Cheers,
Agnieszka
_AnonymousUser
Specialist III
Specialist III
Author

I have identified the problem.
I was using the "embedded" connection. The Talend's job was showing the connection was fine but in the reality the generated Java code had a connection string, omitting my host and port specified in the settings. As a result, Talend was trying to execute a HIVE query locally on my Windows machine.
Why is "embedded" wrong? Why the tool pretends the HIVE remote connection worked fine? Why it tries to run HIVE query locally even though I have specified the remote host and the port? I would consider it a bug...
Anonymous
Not applicable

I have identified the problem.
I was using the "embedded" connection. The Talend's job was showing the connection was fine but in the reality the generated Java code had a connection string, omitting my host and port specified in the settings. As a result, Talend was trying to execute a HIVE query locally on my Windows machine.
Why is "embedded" wrong? Why the tool pretends the HIVE remote connection worked fine? Why it tries to run HIVE query locally even though I have specified the remote host and the port? I would consider it a bug...

Please report a bug on our bugtracker, export your job and attach it!
Thank you!
Anonymous
Not applicable

Hello,
Here is the explanation:
You have two different ways to connect to hive: the standalone mode and the embedded mode.
The standalone mode is a direct JDBC connection to the Hive server. The Hive server usually runs on the port 10000.
The embedded mode is a kind of indirect connection since a hive server is embedded in your client job. You then connect to the hive metastore through Thrift. The Thrift server doesn't run on the same port.
Finally, in order to fix the issue you have met above, you would have to specify the jobtracker (there is an option in the components).
HTH,
Rémy.