Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

import data (csv or excel etc) into apache Spark

Hi,

 

I am new to Talend.

But I have used similar etl tools from Pentaho.

 

I want to use Talend to import say a CSV file(large one) into the distributed file system of Spark(RDD) and or Hadoop(HDFS).

I can import this via commands at the command line, BUT i really want to use a GUI based tool instead.

 

I hope that someone can let me know if Talend can do this.

I could not find any simple tutorials on this.

 

Hope someone can help on this topic.

 

Paluee

Labels (3)
5 Replies
Anonymous
Not applicable
Author

Hello,

Here is a tHDFSOutput component which is used to write data flows it receives into a given Hadoop distributed file system (HDFS).

Please take a look at component reference:TalendHelpCenter:tHDFSOutput

Best regards

Sabrina

 

 

Anonymous
Not applicable
Author

OK,

 

Thanks for this.

Is this component in the Talend Open Studio.

Do they have a similar component for Apache Spark?

Anonymous
Not applicable
Author

Hello,

The tHDFSOutput component in this framework is available when you are using one of the Talend solutions with Big Data. Talend open studio for bigdata.

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi there,

Thanks for your reply.

I actually discovered this fact that : Talend open studio for bigdata., has the components for Hadoop,

just a little while ago before your reply here.

 

And then next to it, it showed that for Spark, its seems that there is a component for Spark but it is not in the free version but in the paid version.

Can you validate that this is the case.

 

Regards,

 

P

 

Anonymous
Not applicable
Author

Hello,

 Batch Processing (MapReduce, Spark), Native Hadoop Connectors and Real-Time Processing (Spark Streaming) are available in Talend subscription version not open source.

Please take a look at bigdata product page:http://www.talend.com/products/big-data/

Best regards

Sabrina