Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm new to talend and looking to join two tables. I used the tMap component and it worked fine, but then I looked into it. I have two tables, lets say table A and table B. When I drag them into the designer they're recognized as MySqlInputs which is what I want. However, when I click the actual table components, I can see the query that they're running. Both tables are just running a SELECT fields FROM table query.
From what I've read online you would take these two components and route them into the tMap. However, this doesnt really make sense to me. Why am I selecting ALL of the rows from both tables, then letting talend do the join? It would be significantly faster if I just told the actual database to do the join, and then play around with the results. Is there anyway to join two or more tables without having to SELECT every single record, then let talend handle the joining with a tMap component?
Hi,
Your thoughts are absolutely correct and it is a common mistake done by many ETL developers. Any ETL tool including Talend is not a replacement for database table joins. So if you are pretty sure that your dataset is a join of two or more tables, feel free to write the SQL joining all the required tables and add the query inside tDBInput component. You can add one of the tables you are using for joining in the table name component of tDBInput.
The advantages of this methodology is that you are reducing the result set from source level itself. For example, you have a table with 1 million records and 30 columns. But you need only last day's records which might be approximately 10,000 records (after joining with other tables or by adding necessary filter conditions) and you are looking for only 5 columns out of 30. In this case, you should always apply the filter at the source to reduce the data traffic to later stages including to Talend.
tMap is a component used to do joins when you do not have the opportunity to do it in previous level and it is also used to join data sets from multiple source systems (for example data flowing from a file and DB). I hope I have given answer to your query in detailed manner. Enjoy programming in Talend 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
Your thoughts are absolutely correct and it is a common mistake done by many ETL developers. Any ETL tool including Talend is not a replacement for database table joins. So if you are pretty sure that your dataset is a join of two or more tables, feel free to write the SQL joining all the required tables and add the query inside tDBInput component. You can add one of the tables you are using for joining in the table name component of tDBInput.
The advantages of this methodology is that you are reducing the result set from source level itself. For example, you have a table with 1 million records and 30 columns. But you need only last day's records which might be approximately 10,000 records (after joining with other tables or by adding necessary filter conditions) and you are looking for only 5 columns out of 30. In this case, you should always apply the filter at the source to reduce the data traffic to later stages including to Talend.
tMap is a component used to do joins when you do not have the opportunity to do it in previous level and it is also used to join data sets from multiple source systems (for example data flowing from a file and DB). I hope I have given answer to your query in detailed manner. Enjoy programming in Talend 🙂
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂