In Big Data Batch Spark jobs (but not Map Reduce) I see the component tHiveOutput. This component is not documented in the Help though.
I have a use case to insert into a number of partitioned Hive tables in Parquet format. I would like to understand this component's behaviour to see if this it is appropriate for my needs.
Many thanks. I have a couple of questions regarding the component if that's OK.
1. What is the reason that this component is only available in Spark Big Data jobs and not Map Reduce ?
2. It's good to see that it has a Parquet option (which is what my target table uses). Does that include Snappy compression?
3. Does the component support partitioned Hive tables? i.e. will it correctly write records into files in the correct HDFS directory structure according to the "partitioned by" clause in the DDL of the Hive tables ?
4. Does the component support bucketed Hive tables? i.e. will it correctly distribute the records across the buckets according to the "clustered by" clause in the DDL of the Hive tables ?
We are looking to use these features in the design of our Hive tables so I'm hoping that I can use Talend as a more elegant and efficient solution to transform and load my Hive tables, compared to using Hive SQL and INSERT INTO statements.
Hi Team, I have installed Talend SandBox and trying to understand the job designs and components. I have questions on Big data batch Job design. 1.I am not seeing tHiveOutput and tHiveInput Components in Big Data Batch Job. If I want to read data from Hive tables, then do i need to use tJDBCInput Components only? 2. I am not seeing Partitioners/Collectors in Big data Batch Job. 3. Does Big data batch Job is converted in to Java and then executed on hadoop Cluster? and Does Spark Job code is convered in to scala and then executed on hadoop/spark cluster? Could you please confirm.