
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Read HDFS file fails with The import org.apache.spark.ml.linalg.Vector cannot be resolved
Hi,
I'm trying to read a csv file from HDFS. So far, I have only 2 components on my canvas: tHDFSConfiguration and tFileInputDelimited. When the job is run, the job fails with "The import org.apache.spark.ml.linalg.Vector cannot be resolved". I'm running Talend Studio 7.0.1 with a Cloudera distribution, by the way.
How can I solve this?
Regards,
Veronica

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@veee,can you show your job design and is it standard job or batch job?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @manodwhb,
Here's the screenshot of the job. It's a Big Data Batch job.
If it helps, I was able to push a file into HDFS with a standard job. So the HDFS connection should be okay...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@veee,form the tfileinputdelimited connect to tlogrow and execute and show me that? Hadoop cluster is running fine?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Veronica,
Could you please double check whether you have installed all the necessary jars needed for this component? The error is saying that one of the class import is missing.
Could you please open a new job and put the tHDFSConfiguration component again and see? If it is prompting to add any libraraies to be added to the component, please install them. Also I would suggest to check whether there is any error by quickly toggling to the code tab of your job.
Warm Regards,
Nikhil Thampi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@manodwhb: I've tested that as well, but the job itself fails to run with or without the tLogRow.
Hadoop cluster should be running fine as I'm able to push files into HDFS. I'm also able to choose the file I wanted to read from within Talend's tFileInputDelimited component.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @nthampi
Yes, I did exactly that when I first used the tHDFSConfiguration component in the canvas. I was prompted to install several libraries. But the job still fails to compile.
It does seem that there is a missing class that Talend was unable to detect. I tried looking at the code tab and the class in question is indeed missing.
I've installed the libraries as requested by the tHDFSConfiguration component though. Are there any additional libraries that I have to include? If yes, how do I add them to my project?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can add any additional libraries using tlibraryload component. But that is not the problem here. All the associated libraries should be automatically installed once you give the permission to install.
@xdshi, Could you please share some advice here?
Warm Regards,
Nikhil Thampi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm not sure whether this can be considered a solution, but I managed to get the job running by adding a tLibraryLoad component to load spark-mllib-local_2.11-2.0.2.jar which I had to download manually online.
I don't think this has happened before when I was using Talend 6.4.1 back then though.
Thanks for the help, anyway!
