Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
ALERT: The support homepage carousel is not displaying. We are working toward a resolution.

Unexpected null schema column value returned by tHiveInput query in a Spark Job

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Unexpected null schema column value returned by tHiveInput query in a Spark Job

Last Update:

Jan 22, 2024 9:35:30 PM

Updated By:

Jamie_Gregory

Created date:

Apr 1, 2021 6:21:34 AM

Talend Version      6.3.1

Summary

 
Additional Versions 
ProductBig Data
Component 
Problem Description

When querying a Hive table with a tHiveInput component in a Big Data batch Spark Job, an unexpected null column value is returned.

 

To illustrate this issue, consider a Hive table named hivetable that has two columns, id and name, and one row:

 

create table hivetable (id int, name string);
insert into hivetable values (1,"one")

A Spark Job is composed of the following tHiveInput and tLogRow components, as shown:

 

0693p000008u8C4AAI.png

When executing the Spark Job to query the hivetable table, the Name column has the value null, as shown by the tLogRow output:

.--+----.
|tLogRow_1|
|=-+---=|
|id|Name|
|=-+---=|
|1 |null|
'--+----'

You would expect the Name column value to be one instead of null.

Problem root cause

In the creation of the Avro payload related to the tHiveInput output, the column names are always lowercase. In this case, the issue is caused by the capitalized column name (Name) in the tHiveInput schema, which you can see in the screenshot.

 

Note: You may have a similar issue when using components other than tLogRow, such as tFileOutputDelimited.

Solution or Workaround

Ensure that all schema column names in the tHiveInput component are lowercase.

0693p000008u8gLAAQ.png
 

In this example, when the tHiveInput schema column names id and name are lowercase, tHiveInput returns the expected row values:

 

.--+----.
|tLogRow_1|
|=-+---=|
|id|name|
|=-+---=|
|1 |one |
'--+----'

The name column value is now one instead of null.

JIRA ticket number 
Labels (1)
Version history
Last update:
‎2024-01-22 09:35 PM
Updated by: