Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Lookup OR condition in tMap

I have two inputs a and b
I need to do a lookup as the following OR condition
a.a1 = b.b1 or a.a1=b.b2 or a.a1=b.b3, if no match in all these 3 fields, go to reject file
Can I simply use one TMAP with one lookup to specify this condition to get the matched and reject?
Much appreiated
Labels (2)
30 Replies
alevy
Specialist
Specialist

OK, I understand what you want (not sure what I was thinking): you want row a in your output once if a1 does not match b1, b2 or b3 in any row b.
You'll have to do it with three left-join lookups to "b" with your output condition being: a.a1==null || (b1.b1==null && b2.b1==null && b3.b1==null)
Note that if the joining fields are both null, tMap will treat that as a successful join, which is why I've included the test on a.a1==null.
If your b file is very large, you can speed up the job (and reduce the memory required) by first reading it into a tHashOutput and then using tHashInputs for your three tMap lookups.
Sorry to try so many times to upload

You can edit any previous post you've made to re-attempt the attachment.
Another question regarding tReplicate component

Talend does not allow flows to be split and rejoined within the same subjob; search the forum for "cycle flow" for many posts regarding this.
Anonymous
Not applicable
Author

Thanks a lot for your information. I got my result using your way now.
1) I found it will only work when the 3 look ups are left outer join, but in your screen, it is showing inner join, is that a typo?
2) I found when using tHash* component as the lookup, we no need to sort before lookup. If you are using a tFileDelimited, you have to sort first. For this example, if we use tFileDelimited, before lookup, I have to sort b1 file by the filed b1, b2 file by b2, b3file by b3. But for tHash, we no need to do that, it is more convenient, can I know it is not required to sort by diffent keys?
3) I look at the cycle flow topic in the forum and understand when Talend doesn't allow this. But I think probably the other tool is better than Talend in this way. I have used Ab Initio for 5 years, in Ab Intio, you can use the replicate componet to do the same thing.
In Ab Initio, it automaticly lands the flows temporarily to disk and waits for all data to become available, then proceeds with the merge or look up in the secondary phase. It can even explictly specify differnt phases, what is your opinion for this?
alevy
Specialist
Specialist

1) The screen-shot doesn't show them as inner-join; the box is not checked?
2) You don't need to sort first when using tFileDelimited (or any other source) for the lookup?
3) I have not used any other products (and only used Talend for less than a year) so cannot compare but bear in mind that Talend is designed to be far more flexible than other products by generating Java code.
Talend reads all the data for the lookup so it's available for the join before starting the main flow, which is generally processed one row at a time. So it's not possible to use one input component for both the lookup and main flows. tHash is the way to achieve this. Talend only uses the disk for lookup data if explicitly told to because of the processing hit.
Anonymous
Not applicable
Author

Less than one year to use Talend, but I feel now you are alreday an expert. 0683p000009MACn.png
1) Yes, I didn't realized the inner join is not checked. 0683p000009MACn.png
2) Just want to understand more how tHash component works,
For my understanding, usually when we hash the file, we will have a hash key, that is why the lookup will be more efficient. In this case, we have 3 different keys for look up.
So when we use tHash, which field will the system use to hash by key?
alevy
Specialist
Specialist

Thanks but my knowledge is limited to a very small part of Talend's potential 0683p000009MACn.png
I think tHash hashes the key fields in the schema but tMap hashes any lookup flow based on the joined fields. This means that there will be additional memory used when tHashInput is a lookup flow rather than a main flow.
Essentially, I think you should just view tHash as a temporary table in memory but I'm just speculating; a real Talend person (or someone prepared to wade through the Java) would have to confirm.
Anonymous
Not applicable
Author

Peter and alevy, when you are in Talend click on the component on the canvas and press F1. A quite handy guide should pop up from the right hand side that in most cases has a pretty good and detailed description of the component as well as use cases, examples with step by step guidance.
You may also want to consider downloading the user guide(s). However, since this particular component is not in there: the tHashOutput and tHashInput components are temporary tables as alevy figured. They can be used to avoid having to run the same query multiple times for example by storing the results in the Hash table. See one scenario in https://community.talend.com/t5/Design-and-Development/create-an-excel-document-with-multiple-sheets... .
Anonymous
Not applicable
Author

I see. Thanks a lot for your help. My email is peter19972008@gmail.com
Anonymous
Not applicable
Author

Sorry Alevy. The last question for this topic, just want to confirm, for my understanding, we are not required to specify the hash key in the thashoutput and tHashinput compoent, the key will be specify in the tMap component.
Is my understanding right?
alevy
Specialist
Specialist

Yes
Anonymous
Not applicable
Author

Thanks a lot Alevy and Gabor for your informaiton