Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear All,
I am new in Talend , I need to know about Joins in talend .
I am doing a Job in Talend where I join two to three tables with tmap component ,
But I have a question about If I can write join query between in TpostgresqlInput component which does same for me then why would I use tmap to Join two tables.
Could any one tell me the real difference between Noraml Join query between two tables in tpostgresqlInput component and tmap join. does it have any relation with memory management or job execution time (processing speed) .
Hi @mac_vardam07,
Greetings of the day,
Welcome to Talend. Well Joins in Talend can be done in multiple methods, as you have pointed it out you can perform joins in your Database Components as well as other Talend components like Tmap or Tjoin...
Let me give you a clear idea ->
You can perform joins in 2 components other than DB components... one of them is tjoin... in this component we can perform inner join (only inner join) and inner join by capturing the reject records.
while the other components is TMap...where you can perform inner,left outer,right outer and full outer joins.. well Tmap consumes bit of performance(consumes lot of memory).. The reason for this is Tmap itself generates quite complex codes. and if you perform joins on that.. you will only increase its memory usage.
for example Table A and Table B have to be joined via tmap where in tmap you have defined the key column and join as per the requirement.
DEMO SCENARIO :
TABLE_A--------->------->
TMAP --------> OUTPUT_COMPONENT
TABLE_B--------->------->
Number of components used 4(including input and output).
Usage of joins and other required logic(as per requirement) for Tmap.
While from the DB components the join is performed in the DB level and data moves as row by row....
so for example Table A and Table B are joined(will be joined in the DB level) and the result-set of this would be sent to the next component/transformation as a row.
DEMO_SCENARIO:
TDBINPUT_COMPONENT(QUERY FOR JOINS FROM 2 TABLES and CONSIDERING THE DB CONNECTION IS TAKEN FROM REPOSITORY)------------>OUTPUT_DBCOMPONENT
Number of Components -> 2
Join is performed in the DB level and then output is produced..
however there is another theory here... DB -> DB willl be slower.. so maybe if you want to enhance the performance then you try this... DB---> FILE---> DB...(This usually improves the Performance.)
I hope you have got the difference... '
Pls reach out to the Talend Community,if necessary.
Thanks,
Ankit.
Hello,
Could you please give us some background about your use case? The memory consumption will depend on the size of the dimension tables, the size of your fact table and data transformations and so on.
Best regards
Sabrina
Hi @mac_vardam07,
Greetings of the day,
Welcome to Talend. Well Joins in Talend can be done in multiple methods, as you have pointed it out you can perform joins in your Database Components as well as other Talend components like Tmap or Tjoin...
Let me give you a clear idea ->
You can perform joins in 2 components other than DB components... one of them is tjoin... in this component we can perform inner join (only inner join) and inner join by capturing the reject records.
while the other components is TMap...where you can perform inner,left outer,right outer and full outer joins.. well Tmap consumes bit of performance(consumes lot of memory).. The reason for this is Tmap itself generates quite complex codes. and if you perform joins on that.. you will only increase its memory usage.
for example Table A and Table B have to be joined via tmap where in tmap you have defined the key column and join as per the requirement.
DEMO SCENARIO :
TABLE_A--------->------->
TMAP --------> OUTPUT_COMPONENT
TABLE_B--------->------->
Number of components used 4(including input and output).
Usage of joins and other required logic(as per requirement) for Tmap.
While from the DB components the join is performed in the DB level and data moves as row by row....
so for example Table A and Table B are joined(will be joined in the DB level) and the result-set of this would be sent to the next component/transformation as a row.
DEMO_SCENARIO:
TDBINPUT_COMPONENT(QUERY FOR JOINS FROM 2 TABLES and CONSIDERING THE DB CONNECTION IS TAKEN FROM REPOSITORY)------------>OUTPUT_DBCOMPONENT
Number of Components -> 2
Join is performed in the DB level and then output is produced..
however there is another theory here... DB -> DB willl be slower.. so maybe if you want to enhance the performance then you try this... DB---> FILE---> DB...(This usually improves the Performance.)
I hope you have got the difference... '
Pls reach out to the Talend Community,if necessary.
Thanks,
Ankit.
Hi Ankit,
Thank you for the solution , I understand the difference but where you have mentioned about tmap component that it consumes little bit memory , but is it processing fast than normal query which we'd write in DB component ?
Because both affecting the performance of JOB . So which way would you prefer to code in talend ?
" if you want to enhance the performance then you try this... DB---> FILE---> DB...(This usually improves the Performance.)" --> please give me example to do this.
@ankit7359 wrote:
Hi @mac_vardam07,
Greetings of the day,
Welcome to Talend. Well Joins in Talend can be done in multiple methods, as you have pointed it out you can perform joins in your Database Components as well as other Talend components like Tmap or Tjoin...
Let me give you a clear idea ->
You can perform joins in 2 components other than DB components... one of them is tjoin... in this component we can perform inner join (only inner join) and inner join by capturing the reject records.
while the other components is TMap...where you can perform inner,left outer,right outer and full outer joins.. well Tmap consumes bit of performance(consumes lot of memory).. The reason for this is Tmap itself generates quite complex codes. and if you perform joins on that.. you will only increase its memory usage.
for example Table A and Table B have to be joined via tmap where in tmap you have defined the key column and join as per the requirement.
DEMO SCENARIO :
TABLE_A--------->------->
TMAP --------> OUTPUT_COMPONENT
TABLE_B--------->------->
Number of components used 4(including input and output).
Usage of joins and other required logic(as per requirement) for Tmap.
While from the DB components the join is performed in the DB level and data moves as row by row....
so for example Table A and Table B are joined(will be joined in the DB level) and the result-set of this would be sent to the next component/transformation as a row.
DEMO_SCENARIO:
TDBINPUT_COMPONENT(QUERY FOR JOINS FROM 2 TABLES and CONSIDERING THE DB CONNECTION IS TAKEN FROM REPOSITORY)------------>OUTPUT_DBCOMPONENT
Number of Components -> 2
Join is performed in the DB level and then output is produced..
however there is another theory here... DB -> DB willl be slower.. so maybe if you want to enhance the performance then you try this... DB---> FILE---> DB...(This usually improves the Performance.)
I hope you have got the difference... '
Pls reach out to the Talend Community,if necessary.
Thanks,
Ankit.