[resolved] Talend Open Studio for Big Data - Spark components not showing up
Hi
I downloaded Talend Open studio for Big Data Version 6.2.1 today and was looking to configure a Spark Job. I have a complete Hadoop environment on my laptop with 3 nodes in the Cluster. This includes a Spark Cluster as well. Presently i am writing scala code using Intellij and running my jobs. They are working quite well.
In this context, i would like to see if i can, instead of creating Scala code, use talend to work with my Spark Cluster.
However after installing i don't know how to get the following
1. Do not see "Big Data Batch" under Job Designs
2. Do not see Spark as an option under Big Data in the Component palette
Please let me know if there are some documentation to configure Spark with Talend
Regards
Balaji Krishnan
Hi, Here is bigdata product matrix page:https://www.talend.com/products/big-data. Could you please take a look at it to choose a Talend Big Data Integration solution with the feature set and licensing options to best fit your project and budget? Batch Processing (MapReduce, Spark), Native Hadoop Connectors and Real-Time Processing (Spark Streaming) are available in Talend subscription version not open source. Best regards Sabrina
We are planning to buy Talend Bigdata licence. please help !! Do we have any component in Talend Bigdata SandBox to write custom spark code using Scala? Or do I need to use any other Talend product to create BigData streaming and Batch Jobs in which I can write custom code also?? please reply thanks in advance. it's urgent.
From what I've seen, custom spark code must be done via Java (java api to work with RDDs and pairRDDs and also dataframes api can be used). When creating a spark job, the custom code can be put in tJava and tsqlrow. Both of those components have comments for how to call the RDDs and DataFrames from your upstream inputs.