
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[resolved] Join to 1 or the other table
How do I do this in Talend Studio? What components will allow me to get the desired results?[color=#000000][font=Arial, Helvetica, sans-serif] I expect there is a recommend method to accomplish this task. [/font][/color]
I want to create a CSV file with data always from table A. Also based on a value in each row from table A get a row from table B or C. There are 2 values of concern in table A. ID and alternate number (alt_no). In all cases there will be a row in B where A.ID = A.ID Yet when A.alt_no is > 0 [color=#000000][font=Arial, Helvetica, sans-serif] we want values from the alternate table C where C.ID = A.ID and C.alt_no = A.alt_no. Furthermore the CSV file will have 1 column from table B or C. For example get the value for CSV column name city from B or C depending on which one was used. [/font][/color]
My 1st idea is to use tJDBCInput for the 3 tables and have them go into tMap. Then use the expression editor to select the desired rows with a left outer join. I have not discovered a way to build the expression to work this way.
Talend integration cloud SaaS edition of Talend studio version 6.3
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SELECT A.*, B.CITY
FROM TABLEA A
INNER JOIN TABLEB B
ON A.ID = B.ID
WHERE A.ALT_NO <= 0
UNION ALL
SELECT A.*, C.CITY
FROM TABLEA A
INNER JOIN TABLEC C
ON A.ID = C.ID
AND A.ALT_NO = C.ALT_NO
WHERE A.ALT_NO > 0
If that is the type of results you are looking for, stick that query in a SQLInput component, add a tMap if any other data manipulation is needed and add you CSVOutputFile component.
Thanks,
Scott

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to create a CSV file with data always from table A. Also based on a value in each row from table A get a row from table B or C. There are 2 values of concern in table A. ID and alternate number (alt_no). In all cases there will be a row in B where A.ID = A.ID Yet when A.alt_no is > 0 we want values from the alternate table C where C.ID = A.ID and C.alt_no = A.alt_no. Furthermore the CSV file will have 1 column from table B or C. For example get the value for CSV column name city from B or C depending on which one was used.
Could you please elaborate your case with an example with input and expected output values?
Best regards
Sabrina

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SELECT A.*, B.CITY
FROM TABLEA A
INNER JOIN TABLEB B
ON A.ID = B.ID
WHERE A.ALT_NO <= 0
UNION ALL
SELECT A.*, C.CITY
FROM TABLEA A
INNER JOIN TABLEC C
ON A.ID = C.ID
AND A.ALT_NO = C.ALT_NO
WHERE A.ALT_NO > 0
If that is the type of results you are looking for, stick that query in a SQLInput component, add a tMap if any other data manipulation is needed and add you CSVOutputFile component.
Thanks,
Scott

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recall from somewhere that it is better to split up the table access in the job with multiple tJDBCInput components. That is the main reason I was trying to join or merge the data with tMap. At this point with this job I will take whatever works. I think that idea was in the context of planning to deploy the job with TIC - Taland Integration Cloud.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think as a general ETL architecture design, I agree with what you read: it should be split up for maintainability reasons and to put the business logic as part of the design. One thing I have learned about Talend as compared to other Gartner Q1 ETL tools, is that Talend is not as efficient in this regard. Most tools recognize the use of tables from the same datastore/database and generate the necessary SQL code to make it as efficient as possible and to push down most of the heavy lifting to the database. This frees up precious memory to do actual transformations of your data. Talend, on the otherhand, just generates java code on a component by component basis and will, too many times, use a lot of server resources that could otherwise be done at the db level. Unless they are on different sources, I use pure SQL and then manipulate/transform as needed in the job. Even if they are different source databases, I usually (depending on the size) load into temp tables on the target DB and use the ELT components or SQL based query to derive my true source recordset. Performance and optimization is just as important to ETL architecture as is the defined business logic for the data integration.
Thanks,
Scott

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
Scott

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
