Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
nbang
Contributor
Contributor

Talend Spark jobs using Dataframes

Hi All,

 

I am new to Talend BigData. I am migrating all my DI jobs to Spark for faster execution.

I came across tSQLRow component which I read uses Spark SQL for execution. It was my observation that any operations like Join or aggregation worked faster using the tSQLRow against the components like tMap and tAggregateRow.

The only difference I could see was that Talend components work on RDDs where as tSQLRow works on Dataframes.

I was wondering if Talend components can also work on Dataframes instead of RDD.

 

Looking at current design I am almost moving every key based operation into tSQLRow. This is hampering the readability of my jobs.

 

Any comments regarding this would be appreciated.

Labels (2)
4 Replies
Jesperrekuh
Specialist
Specialist

Talend has generated java code by components, therefore somebody needs to implement the RDD or/and Dataframe functions, Both are totally different spark java api's ...
The answer is no... but yes if you / somebody is willing to modify the component and add a radiobutton * RDD * Dataframe.

nbang
Contributor
Contributor
Author

Do you mean that Talend is not handling RDDs even in Spark jobs ? I could see functions related to RDDs in the generated code. I could also see code related to Dataframes. However tMap deals with RDDs and tSQLRow deals with Dataframes.

Jesperrekuh
Specialist
Specialist

No ... I mean there're some fundamental differences between them... and spark jobs will definitely handle them but it depends on how the components are constructed.

Different components : different strategies, the question is, which type (RDD, Dataframe, Dataset) is most appropriate to use : https://data-flair.training/blogs/apache-spark-rdd-vs-dataframe-vs-dataset/

Viswa560
Contributor
Contributor

Hi Team,

I am using Talend 7.3.1, Kindly let me know whether Talend using RDD's or Dataframes when I design a normal job with out Tsqlrow and by using Tmap, azure GEN2 , darabricks 5.5 LTS.

Thanks,

Viswa