Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm having serious issues with writing a date in the correct format to a parquet file. Here's the situation:
I read an xml file with a date as follows: "2019-09-24". I set the type to DATE in tMap and it has the correct format: "yyyy-MM-dd". I output this in the same way to a parquet file. The OutputParquet component has the same data types as Talend so I'm assuming it's saved in the same format.
The problem is when I try to read this file as an external table from Imapala. When specifying the DDL I have to use a TIMESTAMP as Impala doesn't support DATE as a datatype. I've tried every format imaginable, on both source and target, but I can't get it to work. I've even tried setting all fields to STRING and it still won't allow me to read the file.
Error message is something like this:
"0.parquet' has an incompatible Parquet schema for column 'stage.cdc_current_pure_application.submission_date'. Column type: TIMESTAMP, Parquet schema: optional float budget_diff [i:16 d:1 r:0]"
Can somebody please tell me how to correctly store a DATE in a parquet file and how to correctly read it as an external table from Impala?
Thanks
Hi,
Could you please try the conversion to Timestamp format before inserting the data to Parquet file? Once the data is inserted properly in Timestamp, it should be fine while reading it again during later stages.
Try tConvertType component or a custome code in tJavarow for the conversion of data to Timestamp in the preferred format.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
in addition to @nthampi would be good if you share your job design and what the problem you have.
regards, Vlad