Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
H1694942638
Partner - Contributor II
Partner - Contributor II

tMap Inner Join on Casted Columns Gives Fewer Results Than Impala Query

Hi there, 

I'm facing a puzzling issue where a job's row count doesn't match the result from an equivalent direct SQL query. It seems to be happening within the tMap component during an inner join that involves columns of different data types.


Talend Version

Talend Big Data Platform 8


My Talend Job Setup

1. Input 1 (tImpalaInput_1) : Retrieves data from t1. The query includes casting a DOUBLE to a STRING to prepare for the join.
* t1.ref (data type: DOUBLE)
* t1.customer (data type: STRING)
* Query Snippet: CAST(ref AS STRING) AS ref_str FROM t1

2. Input 2 (tImpalaInput_2) : Retrieves data from table2. The query casts an INT to a STRING.
* t2.ref (data type: STRING)
* t2.customer (data type: INT)
* Query Snippet: CAST(customer AS STRING) AS customer_str FROM t2

3. tMap: Performs an inner join between the two inputs.
* Join Condition: t1.ref_str = t2.ref AND t1.customer = t2.customer_str

 

The Problem & My Observations

The final output of the tMap has a lower row count than when I run the equivalent INNER JOIN query directly in the Impala shell.

* I have already verified that the row counts coming from each individual tImpalaInput component perfectly match the row counts from a SELECT COUNT(*) on each table in Impala. The data is being read correctly into Talend.
* The issue begins specifically with the tMap join. To be sure, I removed all filters from the tMap, leaving only the pure inner join, but the result is the same (by the same I mean, the output is different on both sides, and of course different to the filtered one)

I guess that there's a difference in how tMap is comparing the strings versus how Impala handles the join on the casted values directly in a single query.

What could be causing this in the join logic? Is there a better way to handle these multi-type joins in tMap to ensure a match with SQL behavior?

 

Side Question

For debugging purposes in Talend Open Studio, is it possible to access or view the underlying Java source code that tMap generates for the join operation?

Thanks in advance!

Labels (3)
1 Reply
Shicong_Hong
Employee
Employee

Hi

To confirm the complete matching of the data, print the data to the console before tMap, and compare the data from both input tables, this helps us see what the data looks like after transformation and read from table.

Regards

Shicong