Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] Simple tHDFSPut based job fails on 6.1 but not 6.0

Hi guys,
I have a simple job that uses a tFileList to go through a directory on my local machine, then uses a tHDFSPut to upload the files onto my HDFS. I have made no changes since v6.0 where it used to work. Now I get the following error message after it has loaded the first file.....
Exception in component tHDFSPut_1
java.io.IOException: DataStreamer Exception:
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:796)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1752)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1530)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1483)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)


The UnresolvedAddressException seems strange since I have run the job perfectly well with the same settings in v6.0, and the 6.1 job was actually able to load a single file before I get this error. To show the structure of the job, I am including a screenshot. The tJava is just in place to log some stuff for me, it does nothing else.....

0683p000009MDeg.png
I have raised another question about a warning box I get when I try to load and run this job. It tells me I have a missing Jar (which is horrendously named ...hadoop-conf-_jW{with lots of other characters and numbers}.jar) and I have seen this raised a few times (with different characters and numbers, but the same "hadoop-conf"). I suspect that this might be something completely different, but put it here just in case.
****FAO Talend****
I have been using your tools for years and have grown to accept that there are small teething troubles between versions. That is fine. But I have had nothing but trouble with the BigData edition with EVERY version I have used. You need to ensure that...
1) The products are tested properly
2) There are decent tutorials that work
3) There is a consistency in your approach
4) That simple things work
5) That regression between versions is minimised
I always do my best to advocate Talend. I can advocate DI and ESB with no problems. There are niggling issues with both, but there are workarounds and the community are equipped with the knowledge to help people who are not aware of those issues. But with BigData I am pulling my hair out with the lack of information provided by Talend and the apparent lack of knowledge in the community. This needs resolving. 
Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

OK, I have found the cause of this issue. The default setting for the hdfs-default.xml parameter "dfs.client.use.datanode.hostname" is "false". When you set up connection metadata for v6.0, if you set the "Use Datanode Hostname" to be ticked (true), the value is not automatically passed to components making use of a reference connection to your cluster. I don't recall ticking this box, but I must have. However it was never used in my v6.0 job because the value was never used by any of the tHDFS components that referenced the connection. 
However, when I exported the job and imported it to v6.1 and v6.2, this "Use Datanode Hostname" parameter value was passed correctly to the tHDFS components. Since I do not have hostnames in my mini cluster (I use IP addresses), this prevented the job from working correctly.
It appears that this was actually caused by a correction to a bug instead of an introduced bug. There are a lot of issues which have not been resolved on here relating to this. I will try and post a link to this thread in as many of them as I can.

View solution in original post

6 Replies
Anonymous
Not applicable
Author

I have just downloaded v6.2 to test this with and I get the same error. So, the job works in v6.0, stops working against the same Hadoop cluster with v6.1 and continues to fail in v6.2. Has anyone seen this? Has this been raised as a Jira? Is there a workaround? 
Anonymous
Not applicable
Author

OK, I have found the cause of this issue. The default setting for the hdfs-default.xml parameter "dfs.client.use.datanode.hostname" is "false". When you set up connection metadata for v6.0, if you set the "Use Datanode Hostname" to be ticked (true), the value is not automatically passed to components making use of a reference connection to your cluster. I don't recall ticking this box, but I must have. However it was never used in my v6.0 job because the value was never used by any of the tHDFS components that referenced the connection. 
However, when I exported the job and imported it to v6.1 and v6.2, this "Use Datanode Hostname" parameter value was passed correctly to the tHDFS components. Since I do not have hostnames in my mini cluster (I use IP addresses), this prevented the job from working correctly.
It appears that this was actually caused by a correction to a bug instead of an introduced bug. There are a lot of issues which have not been resolved on here relating to this. I will try and post a link to this thread in as many of them as I can.
Anonymous
Not applicable
Author

This exactly resolve my below problem: Just uncheck the 'Use Datanode Hostname' check box in tHDFSConnection if dfs.client.use.datanode.hostname" is "false" in hdfs-default.xml.
Exception in component tHDFSOutput_1
java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
 at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
 at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
 at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
 at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
 at java.io.BufferedWriter.close(BufferedWriter.java:266)
Anonymous
Not applicable
Author

I am still getting the same issue with thdfsput and other big data mappings. Running 6.2 locally on windows with EMR on AWS
 
I cant even run a successful hdfs put command following the tutorial examples. My HDFS and Hadoop connections are successful and I do see empty files (1) showing up on the name node. I am not using IP names anywhere (if i try to use ip's I get other issues)
Does the thdfsput work or not in this version? If not is there a work around? If not I can stop wasting my time troubleshooting aws and networks and ports (fyi I have AWS wide open on every port) - i cant do client poc's with this product if it doesnt work for simple hdfs puts
Thanks
Anonymous
Not applicable
Author

Hi all,
I still get the same problem.
IP / Hostname are corrects
Check or Unchecked option : "Use Datanode Hostname"  don't resolve this issue

Hortonworks (Docker)  + T.B.D 6.3.1
Anonymous
Not applicable
Author

Hi all,
I still get the same problem.
IP / Hostname are corrects
Check or Unchecked option : "Use Datanode Hostname"  don't resolve this issue. File was created but is empty
@Lan, I don't understand what the ID is for you 

Hortonworks (Docker)  + T.B.D 6.3.1
Thanks