Skip to main content
Announcements
Qlik Connect 2025! Where innovative solutions turn your data visions into reality: REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Load a CSV file into Hive Parquet table

Hello,

 

I have a CSV file with raw data and I'm trying to load it into Hive table that uses the Parquet format. I found a way to do this but I was wondering if there is an easier way to do it which would only require 1 single job.

 

Here's how I did it:

- a Big Data Batch job which reads the CSV file from HDFS (tFileInputDelimited) and outputs it as a Parquet file (tFileOutputParquet)

- a Standard job with just the tHiveLoad component which reads the Parquet file and loads it into the Hive table

 

My question is: is there a way to do this in 1 single job?

 

Many thanks,

Axel

Labels (3)
5 Replies
vapukov
Master II

Hi Axel

 

what wrong with tHiveOutput ?

 

regards, Vlad

Anonymous
Not applicable
Author

Hi Vlad, thanks for your reply. Are you saying that it should work fine if I connect tFileInputDelimited to tHiveOutput if I want the Hive table in Parquet format? Sorry, I'm fairly new to Talend.
vapukov
Master II

Why just not test? 0683p000009MACn.png
It support parquet format
Anonymous
Not applicable
Author

I tested it but I get an error "PartialGroupNameException Does not support partial group name resolution on Windows. Incorrect command line arguments."

 

Any clue what this means?

chabou19
Contributor II

Hi,

If your Hive setup uses Kerberos authentication ... you must ensure it is correctly configured in Talend.

Best Regards.