Re: Read HDFS file fails with The import org.apach... - Qlik Community

Anonymous · ‎2018-12-02

Hi,

I'm trying to read a csv file from HDFS. So far, I have only 2 components on my canvas: tHDFSConfiguration and tFileInputDelimited. When the job is run, the job fails with "The import org.apache.spark.ml.linalg.Vector cannot be resolved". I'm running Talend Studio 7.0.1 with a Cloudera distribution, by the way.

How can I solve this?

Regards,

Veronica

manodwhb · ‎2018-12-02

@veee,can you show your job design and is it standard job or batch job?

Anonymous · ‎2018-12-02

Hi @manodwhb,

Here's the screenshot of the job. It's a Big Data Batch job.

If it helps, I was able to push a file into HDFS with a standard job. So the HDFS connection should be okay...

manodwhb · ‎2018-12-03

@veee,form the tfileinputdelimited connect to tlogrow and execute and show me that? Hadoop cluster is running fine?

Anonymous · ‎2018-12-03

Hi Veronica,

Could you please double check whether you have installed all the necessary jars needed for this component? The error is saying that one of the class import is missing.

Could you please open a new job and put the tHDFSConfiguration component again and see? If it is prompting to add any libraraies to be added to the component, please install them. Also I would suggest to check whether there is any error by quickly toggling to the code tab of your job.

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-12-03

@manodwhb: I've tested that as well, but the job itself fails to run with or without the tLogRow.

Hadoop cluster should be running fine as I'm able to push files into HDFS. I'm also able to choose the file I wanted to read from within Talend's tFileInputDelimited component.

Anonymous · ‎2018-12-03

Hi @nthampi

Yes, I did exactly that when I first used the tHDFSConfiguration component in the canvas. I was prompted to install several libraries. But the job still fails to compile.

It does seem that there is a missing class that Talend was unable to detect. I tried looking at the code tab and the class in question is indeed missing.

I've installed the libraries as requested by the tHDFSConfiguration component though. Are there any additional libraries that I have to include? If yes, how do I add them to my project?

Anonymous · ‎2018-12-03

Hi,

You can add any additional libraries using tlibraryload component. But that is not the problem here. All the associated libraries should be automatically installed once you give the permission to install.

@xdshi, Could you please share some advice here?

Warm Regards,

Nikhil Thampi

Anonymous · ‎2018-12-05

Hi,

I'm not sure whether this can be considered a solution, but I managed to get the job running by adding a tLibraryLoad component to load spark-mllib-local_2.11-2.0.2.jar which I had to download manually online.

I don't think this has happened before when I was using Talend 6.4.1 back then though.

Thanks for the help, anyway!

Read HDFS file fails with The import org.apache.spark.ml.linalg.Vector cannot be resolved

Big Data

Other

Studio

Talend Data Integration

v7.x