Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us at Qlik Connect 2026 in Orlando, April 13–15: Register Here!

Case sensitivity difference for tHiveInput when running in a DI Job versus a Spark Job

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TalendSolutionExpert
Contributor II
Contributor II

Case sensitivity difference for tHiveInput when running in a DI Job versus a Spark Job

Last Update:

Feb 9, 2024 1:22:49 PM

Updated By:

Jamie_Gregory

Created date:

Apr 1, 2021 6:18:49 AM

Problem

When using a tHiveInput component in a standard Job, a query containing lowercase/uppercase column names returns all columns, even if the schema only contains lowercase column names. The same query in tHiveInput with the same schema does not return the column with the name given in uppercase in the query when run in a Big Data Batch.

 

Root Cause

This is a known issue and Hive is not at fault here. Hive is not case sensitive and always uses lowercase independently of the case used in the Studio. However, the difference between a DI Job and a Spark Job is the use of Avro for Spark.

 

With Hive, you can request fields with any case but Spark, uses Avro and it is case sensitive. Moreover, Avro field names are created with the Hive query case but fields are retrieved with Studio case in other components.

 

This means that if in your Hive query field names are not the same as the Studio schema, they are retrieved from Hive, but not from the Avro payload.

 

Example:

  1. Studio schema: col1, COL2

  2. In Hive, it gives: col1, col2

  3. Back in Studio with a Hive query of "SELECT col1, col2 FROM ..." retrieves col1, col2 columns

  4. Creation of an Avro payload with col1, col2 fields

  5. Then Studio tries to find col1, COL2 fields

  6. col1, null

Doing it the Avro way:

  1. Studio schema: col1, COL2

  2. In Hive, it gives: col1, col2

  3. Back in Studio with a Hive query of "SELECT col1, COL2 FROM ..." retrieves col1, col2 columns

  4. Creation of an Avro payload with col1, COL2 fields

  5. Then Studio tries to find col1, COL2 fields

  6. col1, COL2

 

Solution

In a Spark Job, the case for column names in the query must match case in the schema used.

 

Workaround

The other workaround is to use only lowercase letters.

Version history
Last update:
‎2024-02-09 01:22 PM
Updated by: