Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Discover how organizations are unlocking new revenue streams: Watch here
cancel
Showing results for 
Search instead for 
Did you mean: 
PhilHibbs
Creator II
Creator II

HDFS file to Hive table - file format mismatch

I have a job which creates a Hive table, transfers a file to HDFS, and loads the data from the file into the hive table. At least, that's what I want it to do.

 

It falls down at the final step, with this error:

Error while compiling statement: FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table.

 

I'm trying a super-minimal case with the table just having a single integer column, and the file just containing the number 3 and a newline.

 

0683p000009Lzcc.png0683p000009Lzmb.png0683p000009Lzdv.png

Labels (2)
7 Replies
vapukov
Master II
Master II

you use create table if not exists

 

first of course check - is table have the same structure with GENERIC schema?

then - is table have the same format? (file, not parquet, not etc)

is table have the same delimiters with the file?

 

manodwhb
Champion II
Champion II

@PhilHibbs,make sure the schema of the Hive table and the HDFS file. and also you should mention the same path ,which you specified in tHDFSPut. since if your reading the same file which you have loaded into HDFS.

PhilHibbs
Creator II
Creator II
Author

I got this working, but I'm not 100% sure what the problem was.

The issue now is that I could only get it working by not having any delimiters (by which I mean quotes, not the comma separator) or escape characters. If I tick the "Escape" box in the tHiveCreateTable component, I get this error:

Error while compiling statement: FAILED: ParseException line 2:20 character '<EOF>' not supported here


0683p000009M0tn.png

 

My ultimate objective is to be able to load an email address such as a","a!#$%&'*+-/=?^_`{|}~@aaa.net into a Hive table.

PhilHibbs
Creator II
Creator II
Author

The essential problem seems to be that the tHiveCreateTable has no equivalent to "Text Enclosure" in a tFileOutputDelimited. Although it says "Set Delimited row format", you can only specify a separator, not a delimiter, and the Escape doesn't work (or, I can't get it to work).
PhilHibbs
Creator II
Creator II
Author

I got escaping to work! You need to quadruple the backslash, so it appears in the tHiveTableCreate component as "\\\\".

 

https://jira.talendforge.org/browse/TBD-7964 created as this feels like a bug to me. Certainly needs to be documented!

PhilHibbs
Creator II
Creator II
Author

I'm struggling with this again. I thought I got it working a while back, but I can't get it working now!

My problem is with a comma in the data. For example, this line of data in the file:

"2019-05-16T10:05:44.399Z","12","400","{ \"statusCode\": \"400\", \"details": \"Schema validation error\" }"

The last column gets truncated at the first comma so all I get is { "statusCode": "400"

Or this, I can reformat the file if needed:

"2019-05-16T10:05:44.399Z","12","400","{ ""statusCode"": ""400"", ""details"": ""Schema validation error"" }"

So I don't mind if the load file needs Excel-style CSV quoting, or C/Java-style escaping, either will do, but it needs to be abe to load quotes, commas, etc.

Like I said earlier, the escaping can be done by quadrupling the slash: "\\\\"

However, there is no quote specifier in the tHiveCreateTable component.

Is it possible with Serde row format?

PhilHibbs
Creator II
Creator II
Author

I have worked around it by writing out the file without quotes, so my row looks like this:

2019-05-16T10:05:44.399Z,12,GB,BF0073,400,{ "statusCode": "400"\, "details": "Schema validation error" }

I had to manually code the escaping of the comma, so every column that might contain a comma has to have .replaceAll(",", "\\\\,") applied to it before writing. Not ideal.