Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a job which creates a Hive table, transfers a file to HDFS, and loads the data from the file into the hive table. At least, that's what I want it to do.
It falls down at the final step, with this error:
Error while compiling statement: FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table.
I'm trying a super-minimal case with the table just having a single integer column, and the file just containing the number 3 and a newline.
you use create table if not exists
first of course check - is table have the same structure with GENERIC schema?
then - is table have the same format? (file, not parquet, not etc)
is table have the same delimiters with the file?
@PhilHibbs,make sure the schema of the Hive table and the HDFS file. and also you should mention the same path ,which you specified in tHDFSPut. since if your reading the same file which you have loaded into HDFS.
I got this working, but I'm not 100% sure what the problem was.
The issue now is that I could only get it working by not having any delimiters (by which I mean quotes, not the comma separator) or escape characters. If I tick the "Escape" box in the tHiveCreateTable component, I get this error:
Error while compiling statement: FAILED: ParseException line 2:20 character '<EOF>' not supported here
My ultimate objective is to be able to load an email address such as a","a!#$%&'*+-/=?^_`{|}~@aaa.net into a Hive table.
I got escaping to work! You need to quadruple the backslash, so it appears in the tHiveTableCreate component as "\\\\".
https://jira.talendforge.org/browse/TBD-7964 created as this feels like a bug to me. Certainly needs to be documented!
I'm struggling with this again. I thought I got it working a while back, but I can't get it working now!
My problem is with a comma in the data. For example, this line of data in the file:
"2019-05-16T10:05:44.399Z","12","400","{ \"statusCode\": \"400\", \"details": \"Schema validation error\" }"
The last column gets truncated at the first comma so all I get is { "statusCode": "400"
Or this, I can reformat the file if needed:
"2019-05-16T10:05:44.399Z","12","400","{ ""statusCode"": ""400"", ""details"": ""Schema validation error"" }"
So I don't mind if the load file needs Excel-style CSV quoting, or C/Java-style escaping, either will do, but it needs to be abe to load quotes, commas, etc.
Like I said earlier, the escaping can be done by quadrupling the slash: "\\\\"
However, there is no quote specifier in the tHiveCreateTable component.
Is it possible with Serde row format?
I have worked around it by writing out the file without quotes, so my row looks like this:
2019-05-16T10:05:44.399Z,12,GB,BF0073,400,{ "statusCode": "400"\, "details": "Schema validation error" }
I had to manually code the escaping of the comma, so every column that might contain a comma has to have .replaceAll(",", "\\\\,") applied to it before writing. Not ideal.