Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
To achieve desired results, it's often necessary to perform joins between data sources. tMap is a transformation component that facilitates joins, including both inner and outer joins. In this guide, you will learn to perform an inner join on tMap and manage both matched and rejected data.
This procedure was written with:
Talend verified this procedure to be compatible with all versions of Talend Studio.
In this example, there are two data sources: a text file named in.txt and a table named person in MySQL database. The data of each data source are read as follows:
in.txt
id;name;gender
1;Shong;boy
2;Ross;boy
3;Sabrina;girl
4;Elisa;girl
person table
user_id | name | country | |
---|---|---|---|
1 | Shong | CN | shong@qlik.com |
4 | Elisa | FR | elisa@qlik.com |
This example reads the data from the in.txt file as a main flow, and then does an inner join with the data read from the person table on a tMap component based on the id column from the file and the user_id column from the table. The join will get rows, including the external columns from lookup table, if the id already exists in the table, and will reject rows if the id does not exist in the table.
Here is the expected output:
Matched rows:
id | name | gender | |
---|---|---|---|
1 | Shong | boy | shong@qlik.com |
4 | Elisa | girl | elisa@qlik.com |
Unmatched rows:
id | name | gender |
---|---|---|
2 | Ross | boy |
3 | Sabrina | girl |
Create a demo Job called InnerJoinDemo. The Job design is as follows:
tFileInputDelimited_1: reads data from in.txt file.
tDBInput_1: reads data from person table.
tMap_1: performs an inner join and generates two output tables.
tLogRow_1 and tLogRow_2: print output data to the console.
The following images show the detailed component settings:
tFileInputDelimited_1:
tDBInput_1:
tMap_1: performs an inner join between the two incoming data flows, and generates two output tables: one table for the matched rows, and another for the unmatched rows.
tLogRow_1, tLogRow_2: print the output data to the console with table model.
Execute the Job. The console shows the following results:
Starting job InnerJoinDemo at 17:37 05/08/2024.
[statistics] connecting to socket on port 3960
[statistics] connected
.--+-----+------+--------------.
| tLogRow_1 |
|=-+-----+------+-------------=|
|id|name |gender|email |
|=-+-----+------+-------------=|
|1 |Shong|boy |shong@qlik.com|
|4 |Elisa|girl |elisa@qlik.com|
'--+-----+------+--------------'
.--+-------+------.
| tLogRow_2 |
|=-+-------+-----=|
|id|name |gender|
|=-+-------+-----=|
|2 |Ross |boy |
|3 |Sabrina|girl |
'--+-------+------'
[statistics] disconnected
Job InnerJoinDemo ended at 17:37 05/08/2024. [Exit code = 0]
The results show that all matched rows are output into tLogRow_1 and all unmatched rows are output into tLogRow_2.