
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import data (csv or excel etc) into apache Spark
Hi,
I am new to Talend.
But I have used similar etl tools from Pentaho.
I want to use Talend to import say a CSV file(large one) into the distributed file system of Spark(RDD) and or Hadoop(HDFS).
I can import this via commands at the command line, BUT i really want to use a GUI based tool instead.
I hope that someone can let me know if Talend can do this.
I could not find any simple tutorials on this.
Hope someone can help on this topic.
Paluee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here is a tHDFSOutput component which is used to write data flows it receives into a given Hadoop distributed file system (HDFS).
Please take a look at component reference:TalendHelpCenter:tHDFSOutput
Best regards
Sabrina

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK,
Thanks for this.
Is this component in the Talend Open Studio.
Do they have a similar component for Apache Spark?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The tHDFSOutput component in this framework is available when you are using one of the Talend solutions with Big Data. Talend open studio for bigdata.
Best regards
Sabrina

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Thanks for your reply.
I actually discovered this fact that : Talend open studio for bigdata., has the components for Hadoop,
just a little while ago before your reply here.
And then next to it, it showed that for Spark, its seems that there is a component for Spark but it is not in the free version but in the paid version.
Can you validate that this is the case.
Regards,
P

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Batch Processing (MapReduce, Spark), Native Hadoop Connectors and Real-Time Processing (Spark Streaming) are available in Talend subscription version not open source.
Please take a look at bigdata product page:http://www.talend.com/products/big-data/
Best regards
Sabrina
