Hi All, We have apache hadoop 2.7.3 installed on Linux machine. we want to connect to hadoop cluster & read sample hdfs file. Inside metadata to configure hadoop cluster, when I use apache as distribution I get option for version 1.0 and there are no other versions available. Please let me know how I can connect to hadoop 2.7.3 from talend. Also i have created a new job with tHdfsconnection component & provided NameNode URI & username. added tHdfsread & tlogRow component to read a hdfs file but job fails with message "connection refused". can anyone let me know what is exact process to read a hdfs file from Apache hadoop cluster.
Could you please show us
the full stack trace printed on console? The
tHdfsconnection component setting screenshot will be helpful for us to address your issue.
Best regards
Sabrina
Hi Sabrina,
Below are details.
1. New Cluster Connection parameters image attached.
2. No sure what should be the libraries we have to specify? internal or external. if external, where to get all the JAR files? and does all Jar files are required? screenshot attached.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.talend.core.utils.ReflectionUtils.invokeStaticMethod(ReflectionUtils.java:229)
at org.talend.designer.hdfsbrowse.hadoop.service.check.provider.CheckedNamenodeProvider.check(CheckedNamenodeProvider.java:63)
at org.talend.designer.hdfsbrowse.hadoop.service.check.AbstractCheckedServiceProvider$1.run(AbstractCheckedServiceProvider.java:45)
at org.talend.designer.hdfsbrowse.hadoop.service.check.CheckedWorkUnit$1.call(CheckedWorkUnit.java:65)
at java.util.concurrent.FutureTask.run(Unknown Source)
... 3 more
Caused by: java.net.ConnectException: Call to 10.223.66.228/10.223.66.228:54310 failed on connection exception: java.net.ConnectException: Connection refused: no further information
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy81.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:251)
... 12 more
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
Hi,
It looks like a connection issue. Can you
connect to your hadoop cluster & read sample hdfs file through client without using talend tool?
Have you already installed required external jar files?
Could you please take a look at document about:TalendHelpCenter:Installing external modules?
Best Regards
Sabrina