Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Error while running tHDFSCopy

Hi everybody,
I am testing Talend Open Studio for Big Data 5.5.0 with a Hadoop cluster on AWS (Cloudera distribution, CDH4.4.0 version). I have a file called customer.csv, which I am trying to copy from my home directory to a subdirectory called /new. I set up a job that consists of only one component - tHDFSCopy. The job runs for awhile producing an EMPTY file customer.csv in the target directory and ends with the following error:
Exception in component tHDFSCopy_1
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-425321293-10.15.244.108-1401446443266:blk_8047645766350991207_142708 file=/user/kpopov/customer.csv
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:839)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:531)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
at java.io.DataInputStream.read(Unknown Source)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:260)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:232)
at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.tHDFSCopy_1Process(CopyFileInHDFS.java:339)
at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.runJobInTOS(CopyFileInHDFS.java:589)
Who can tell me, what is going on?
Labels (4)
4 Replies
Anonymous
Not applicable
Author

Have you checked the scenario - https://help.talend.com/pages/viewpage.action?pageId=9310644#ychen-20120907-bigdata-thdfslist_scenar...
Whether the connection is ok ?
Vaibhav
Anonymous
Not applicable
Author

Yes, the connection is correct: the IP is right, the port (8020) is right, the Hadoop version is correct. As I open the Component inset and and click the button next to the File Name field, the Open Studio connects with HDFS fine and lets me choose a directory for my file to be copied. The only problem is, like I said before, the copied file turns out to be empty and the Open Studio ends the job with the error.
Anonymous
Not applicable
Author

Ok, it seems like the issue was the closed 50010 port for data transfer on the datanode.
Anonymous
Not applicable
Author

Hi kpopov,
Is the component working well for you? If the issue is fixed, may I ask you to click the "Set this topic as resolved" link which is right underneath your initial post? This way, other users will be informed that this thread has been resolved.
Many thanks
Best regards
Sabrina