Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

self join in talend

Hi ,
Could you please let me know how is it possible to have a self join in Talend.
I do not want to use the tOracleInput table several times as that would mean replicating the logic and DB connectivity which could add up to time.
Please let me know another efficient way to achieve the same.
Labels (2)
9 Replies
Aukema
Creator
Creator

You can try using hashes for this.. Something like:
tOracleInput --> tHashOutput

tHashInput  
                    --> tMap --> tOutput
tHashInput
Anonymous
Not applicable
Author

Hi Saukema,
Thanks for your reply but,it doesn't seem to be working for me,below is my job,
0683p000009MDMa.png

In the above job,I need to join my output from tReplicate_1 component to tMap_2 component.But when I am trying to link it ,it doesn't allow me to do so.
Could you please let me know why and what is the workaround for this.
Anonymous
Not applicable
Author

Any updates here please..
Anonymous
Not applicable
Author

Any updates here please..
Anonymous
Not applicable
Author

You can't have a job design where the 2 outputs of a tReplicate go to the same tMap Input : in this case you would have a cycle, and this is not allowed in talend.
The solution provided by saukema should work perfectly with your case. You just have to create a previous subjob with tOracleInput>tHashOutput , and then re-read your hash twice with the tHashInput component.
Don't forget to add the tHash* components to the palette, as they are hidden by default (in Project Properties>Designer>Palette)
Anonymous
Not applicable
Author

Thanks corentinduperray for your reply.
I could use that solution but the question is as hash components would be all in memory so wouldnt it be memory intensive and is that the recommended way.
Other solution would be to again fetch the data from database ,but again if the query is the one which takes a ling time to execute then it adds to the execution time of the job.
So I am looking for a solution which is less time taking and less memory intensive.
Can you please help me understand if that's possible.
Anonymous
Not applicable
Author

Do the self join in SQL in a tOracleInput component.
Anonymous
Not applicable
Author

Yes, rhall_2.0 that could be a solution too,but is that what Talend suggests to join data in database rather than Talend ?
I thought joining data in Talend is faster than joining it in Database.Correct me if I am wrong please.
Anonymous
Not applicable
Author

Joining in the database is always going to be faster no matter what the integration tool. A database is designed to efficiently retrieve and filter data. What is the point in returning everything across the network, transforming it into Java objects, joining in Java (running on a virtual machine) and then immediately throwing away the data you don't need? While Talend can do all of that, joining in Talend when all of your data is in a single database, is a bit like emptying a bath with an egg cup when you could just pull the plug. 
I'm not criticising Talend at all. I wouldn't expect it to be better than a DB at this (or any other tool). This is all about using the right tools and not expecting one tool to do everything. Of course, you can make Talend do it all, but it won't be as efficient as if you select the most efficient tool for each job. There is absolutely nothing wrong with writing SQL when using Talend. I would argue that the best Talend developers make a lot of use of SQL when gathering their data.