Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Compare a list from an input file against file contains lines

Hello,

 

I have an input file contains rows of words and another input file contains rows of lines. The idea is to check if a word found in a line. If found then that line is rejected, if not then that line is written into a file. 

 

In other programming language, i would read the words file first and put all the words into a list. Then i would compare each line against that list, to check if a word found in that line. 

 

How to do this in Talend? I guess tmap is the answer...

 

Thanks in adavance for your help.

Thomas

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

     I believe you are looking for below output.

0683p000009M2Hq.png

 

I have selected the inner join between two flows and selected the records which is not matching the join. Please refer the tMap details below.

0683p000009M2Hv.png

 

Since you have mentioned that there will be larger data volume to process, it will be a good idea to provide temp data directory path also in tMap as below.

0683p000009M2I0.png

 

Hope I have answered your query 🙂 Please spare a minute to mark the topic as resolved and kudos are also welcome 🙂


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

View solution in original post

8 Replies
dipanjan93
Contributor
Contributor

Check tFilterRow component!

Anonymous
Not applicable
Author


@dipanjan93 wrote:

Check tFilterRow component!


Hello,

 

Thanks mate for your answer. But can tFilterRow accepts 2 inputs?

dipanjan93
Contributor
Contributor

Nope it only works with a single input. If possible could you please elaborate your current scenario. I might be able to share details with you thereafter.

 

 

talend_consumer
Contributor
Contributor

I hav set up a sample job hope it helps

 

file1

id,name,comment
1,Jack,tobedeleted
2,John,marked for delete
3,Ron,keep
4,Sam,review
5,Steve,to be purged

 

lookupfile

status
delete
purge
tobedeleted

 

 

I have attached the flow...the first tmap you create a new expression on right handside...attached screen prints

 

hope it helps.

 

Cheers

 

 

 


talendjob.PNG
Anonymous
Not applicable
Author


@dipanjan93 wrote:

Nope it only works with a single input. If possible could you please elaborate your current scenario. I might be able to share details with you thereafter.

 

 


I have 2 input files, one is person file and the other is data containing expenses for each person.

PersonInput
Id;Name;Department
145534;Andrea;IT
342832;Stephan;Operation
552121;Lionel;Finance
799299;Mael;IT
100001;Syergei;Administration

ExpenseInput
No;Name;City;Detail
101;Mael;Dallas;100
102;Melissa;New York;250
103;Pierre;Chicago;700
104;Lionel;Santa Fe;50
105;Andrea;Miami;550
106;Lionel;Washington;150
107;Stephan;Kansas;800
108;Valerie;Detroit;10

So my code would be (my own pseudo code 0683p000009MA9p.png😞

PersonList = PersonInput[1..4][Name]

Loop ExpenseInput
         If ExpenseInput[Name] Not In PersonList:
            WriteToOutputFile
         Else:
            Reject
End Loop


So the OutputFile would be like below:

102;Melissa;New York;250
103;Pierre;Chicago;700
108;Valerie;Detroit;10


I have tried tMap join, but it always ends up out of memory as I have a very large ExpenseInput. It's the reason why I need to do loop.

Thanks in advance for your help.

talend_consumer
Contributor
Contributor

Let me know if you need more information on the attachment I posted previously....for memory issues there are multiple ways please search talend community there are some recommended options...i.e. you can use local drive for tmap processing and increase the memory size for the job as well...

you can also use tfuzzymap component.

 

Cheers

Ashish

Anonymous
Not applicable
Author

Hi,

 

     I believe you are looking for below output.

0683p000009M2Hq.png

 

I have selected the inner join between two flows and selected the records which is not matching the join. Please refer the tMap details below.

0683p000009M2Hv.png

 

Since you have mentioned that there will be larger data volume to process, it will be a good idea to provide temp data directory path also in tMap as below.

0683p000009M2I0.png

 

Hope I have answered your query 🙂 Please spare a minute to mark the topic as resolved and kudos are also welcome 🙂


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

I wish I could give you 100 kudos 0683p000009MACJ.png

 

Thanks a lot mate !