<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Talend Spark jobs using Dataframes in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278262#M53789</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am new to Talend BigData. I am migrating all my DI jobs to Spark for faster execution.&lt;/P&gt; 
&lt;P&gt;I came across tSQLRow component which I read uses Spark SQL for execution. It was my observation that any operations like Join or aggregation worked faster using the tSQLRow against the components like tMap and tAggregateRow.&lt;/P&gt; 
&lt;P&gt;The only difference I could see was that Talend components work on RDDs where as tSQLRow works on Dataframes.&lt;/P&gt; 
&lt;P&gt;I was wondering if Talend components can also work on Dataframes instead of RDD.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Looking at current design I am almost moving every key based operation into tSQLRow. This is hampering the readability of my jobs.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Any comments regarding this would be appreciated.&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 07:40:21 GMT</pubDate>
    <dc:creator>nbang</dc:creator>
    <dc:date>2024-11-16T07:40:21Z</dc:date>
    <item>
      <title>Talend Spark jobs using Dataframes</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278262#M53789</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am new to Talend BigData. I am migrating all my DI jobs to Spark for faster execution.&lt;/P&gt; 
&lt;P&gt;I came across tSQLRow component which I read uses Spark SQL for execution. It was my observation that any operations like Join or aggregation worked faster using the tSQLRow against the components like tMap and tAggregateRow.&lt;/P&gt; 
&lt;P&gt;The only difference I could see was that Talend components work on RDDs where as tSQLRow works on Dataframes.&lt;/P&gt; 
&lt;P&gt;I was wondering if Talend components can also work on Dataframes instead of RDD.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Looking at current design I am almost moving every key based operation into tSQLRow. This is hampering the readability of my jobs.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Any comments regarding this would be appreciated.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 07:40:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278262#M53789</guid>
      <dc:creator>nbang</dc:creator>
      <dc:date>2024-11-16T07:40:21Z</dc:date>
    </item>
    <item>
      <title>Re: Talend Spark jobs using Dataframes</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278263#M53790</link>
      <description>Talend has generated java code by components, therefore somebody needs to implement the RDD or/and Dataframe functions, Both are totally different spark java api's ... 
&lt;BR /&gt;The answer is no... but yes if you / somebody is willing to modify the component and add a radiobutton * RDD * Dataframe. 
&lt;BR /&gt; 
&lt;BR /&gt;</description>
      <pubDate>Wed, 12 Sep 2018 14:52:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278263#M53790</guid>
      <dc:creator>Jesperrekuh</dc:creator>
      <dc:date>2018-09-12T14:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Talend Spark jobs using Dataframes</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278264#M53791</link>
      <description>&lt;P&gt;Do you mean that Talend is not handling RDDs even in Spark jobs ? I could see functions related to RDDs in the generated code. I could also see code related to Dataframes. However tMap deals with RDDs and tSQLRow deals with Dataframes.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Sep 2018 07:26:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278264#M53791</guid>
      <dc:creator>nbang</dc:creator>
      <dc:date>2018-09-14T07:26:58Z</dc:date>
    </item>
    <item>
      <title>Re: Talend Spark jobs using Dataframes</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278265#M53792</link>
      <description>No ... I mean there're some fundamental differences between them... and spark jobs will definitely handle them but it depends on how the components are constructed. 
&lt;BR /&gt; 
&lt;BR /&gt;Different components : different strategies, the question is, which type (RDD, Dataframe, Dataset) is most appropriate to use : 
&lt;A href="https://data-flair.training/blogs/apache-spark-rdd-vs-dataframe-vs-dataset/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://data-flair.training/blogs/apache-spark-rdd-vs-dataframe-vs-dataset/&lt;/A&gt; 
&lt;BR /&gt; 
&lt;BR /&gt;</description>
      <pubDate>Fri, 14 Sep 2018 11:33:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278265#M53792</guid>
      <dc:creator>Jesperrekuh</dc:creator>
      <dc:date>2018-09-14T11:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: Talend Spark jobs using Dataframes</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278266#M53793</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I am using Talend 7.3.1, Kindly let me know whether Talend using RDD's or Dataframes when I design a normal job with out Tsqlrow and by using Tmap, azure GEN2 , darabricks 5.5 LTS.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Viswa&lt;/P&gt;</description>
      <pubDate>Mon, 14 Sep 2020 11:19:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Talend-Spark-jobs-using-Dataframes/m-p/2278266#M53793</guid>
      <dc:creator>Viswa560</dc:creator>
      <dc:date>2020-09-14T11:19:59Z</dc:date>
    </item>
  </channel>
</rss>

