Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Read HDFS file fails with The import org.apache.spark.ml.linalg.Vector cannot be resolved

Hi,

 

I'm trying to read a csv file from HDFS. So far, I have only 2 components on my canvas: tHDFSConfiguration and tFileInputDelimited. When the job is run, the job fails with "The import org.apache.spark.ml.linalg.Vector cannot be resolved". I'm running Talend Studio 7.0.1 with a Cloudera distribution, by the way.

 

How can I solve this?

 

Regards,

Veronica

Labels (5)
8 Replies
manodwhb
Champion II

@veee,can you show your job design and is it standard job or batch job? 

Anonymous
Not applicable
Author

Hi @manodwhb,

Here's the screenshot of the job. It's a Big Data Batch job.

0683p000009M185.jpg

 

If it helps, I was able to push a file into HDFS with a standard job. So the HDFS connection should be okay...

manodwhb
Champion II

@veee,form the tfileinputdelimited connect to tlogrow and execute and show me that? Hadoop cluster is running fine?

Anonymous
Not applicable
Author

Hi Veronica,

 

     Could you please double check whether you have installed all the necessary jars needed for this component? The error is saying that one of the class import is missing. 

 

      Could you please open a new job and put the tHDFSConfiguration component again and see? If it is prompting to add any libraraies to be added to the component, please install them. Also I would suggest to check whether there is any error by quickly toggling to the code tab of your job.

 

Warm Regards,

 

Nikhil Thampi

Anonymous
Not applicable
Author

@manodwhb: I've tested that as well, but the job itself fails to run with or without the tLogRow.

Hadoop cluster should be running fine as I'm able to push files into HDFS. I'm also able to choose the file I wanted to read from within Talend's tFileInputDelimited component.

Anonymous
Not applicable
Author

Hi @nthampi

 

Yes, I did exactly that when I first used the tHDFSConfiguration component in the canvas. I was prompted to install several libraries. But the job still fails to compile.

 

It does seem that there is a missing class that Talend was unable to detect. I tried looking at the code tab and the class in question is indeed missing.

0683p000009M1Gw.jpg

 

I've installed the libraries as requested by the tHDFSConfiguration component though. Are there any additional libraries that I have to include? If yes, how do I add them to my project?

Anonymous
Not applicable
Author

Hi,

 

    You can add any additional libraries using tlibraryload component. But that is not the problem here. All the associated libraries should be automatically installed once you give the permission to install. 

 

@xdshi, Could you please share some advice here?

 

Warm Regards,

 

Nikhil Thampi

Anonymous
Not applicable
Author

Hi,

 

I'm not sure whether this can be considered a solution, but I managed to get the job running by adding a tLibraryLoad component to load spark-mllib-local_2.11-2.0.2.jar which I had to download manually online.

 

I don't think this has happened before when I was using Talend 6.4.1 back then though.

 

Thanks for the help, anyway!