Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Talend Cloud AWS EU Scheduled Outage: Starting Tues 26 May 21:00 CEST with expected completion Wed 27 May 01:00 CEST
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi,
I am newbie in TOS_BD, was trying to connect to the HDFS system remotely and write some files there but was getting cought up with some errors have attached the job for the flow please have a look at it and suggest.

Process flow description:
-------------------
1) Connected to the remote linex system using tSSH
2) To connect to the HDFS system inside that system used tHDFSConnection
3) Used tHDFSPut to put the file in HDFS system.

Error Log:
-------------------
Starting job test02 at 12:04 13/08/2012.

connecting to socket on port 3489
connected
Talend Open Studio
Exception in component tHDFSPut_1
java.net.UnknownHostException: unknown host: cmtest001
at org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at test_proj.test02_0_1.test02.tHDFSPut_1Process(test02.java:530)
at test_proj.test02_0_1.test02.tHDFSConnection_1Process(test02.java:469)
at test_proj.test02_0_1.test02.tSSH_1Process(test02.java:382)
at test_proj.test02_0_1.test02.runJobInTOS(test02.java:862)
at test_proj.test02_0_1.test02.main(test02.java:730)
disconnected
Job test02 ended at 12:04 13/08/2012.

Labels (3)
11 Replies
Anonymous
Not applicable
Author

Can TOS_BD be connected to remote HDFS ?
--
Regards,
Vinod
Anonymous
Not applicable
Author

Any help would be appriciated.
--
Regards,
Vinod
Anonymous
Not applicable
Author

I haven't tried with tSSH, but certainly TOSBD works when that's not a requirement. See the Talend Channel on youtube for a good number of Big data examples. Although, none of them use 'tSSH'.
Ciaran
Anonymous
Not applicable
Author

Thanks a lot for reply.
Unfortunately from here I cannot access youtube due to some reasons, however will try to browse through the videos later.
From your reply I understand that it certainly works when Talend is installed on the BD environment, on the same linux box.
Will get back to you after trying out suggested options, at the moment I am planning to reformat my system to create some space for hadoop env. where I can test the HDFS get/put.
--
Vinod
Anonymous
Not applicable
Author

Hi
I run Hadoop on a VM, and connect using TOSBD over the HDFS and/or the Templeton interface. TOSBD doesn't require you use SSH.
Ciaran
Anonymous
Not applicable
Author

Hi Ciaran,
I believe since both VM and HDFS are on the same env. so may be that is why they do not require authentication, but what when you are accessing the HDFS server from totally different location. In my case I am running my TOS_BD on Windows XP and from this box I am trying to copy some files to a different machine which is Hadoop server which is geographically located somewhere else.
As per my understanding it should not have any implications as I have all the access credentials.
Well that is a real time need too, as we are planning to migrate all the files to our hadoop env. for many reasons one could be to place all the files to a high speed central location so later if required we can preform analytics plus utilize features of big data env.
--
Regards,
Vinod
Anonymous
Not applicable
Author

Hi vinod ,
I am trying to do the same,connecting my TOS (My OS:WindowsXP) to a remote Ubuntu System where the HDFS system is located.....
Am getting the same error as shown by you....
Any help would be appreciated....
Regards,
Nayan.
Anonymous
Not applicable
Author

Hello,
It's totally possible to connect to HDFS from a windows client. (Talend on windows and a remote cluster).
When I see your error, there are different possibilities:
1 - Your namenode hostname and the namenode IP are not binded correctly on the namenode side (/etc/hosts file)
2 - Your namenode hostname and the namenode IP are not binded correctly on the client side (hosts file in the C:/Windows/System32/drivers/etc folder)
3 - You have not set the good parameters in the component. What have you put in the namenode URI parameter?
Rémy.
_AnonymousUser
Specialist III
Specialist III

Sh.t! Do the Talend's helpdesk people know what their tools are made for? It's just keeping replies, just waste of time!