
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to extract lines/rows that have a regex in the line
I have a file and I need to process lines that have a regex with one tJavaRow and the rejected lines with another tJavaRow. I tried to use the tFileInputRegex and it picks the regex (not the whole line) and the rejected line would have nothing in it. For e.g., if I try this
tFileInputRegex ----success---> tJavaRow1
|____________reject _________>tJavaRow2
and print the row values in 1 and 2, 1 would have the regex and the 2 null. So if my regex is 'sri' the lines containing it like 'sriisnice' would have 'sri' in the row and the ones like 'talendisgreat' would have null. I need to get the actual line. How do I do that?
I tried a few file as well processing components but could not find one that can do it. I might be able to make the extract regex fields or some other processing component do it with java row putting the line in context but no luck so far. I'll keep you posted if I find one.
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dijke,
I am using the right regex to match the whole string, please look at my earlier reply. I further looked at the code and if there is a match the component gets the first matching group. However this should have been the whole string but I get nothing. However giving the string I am looking for gives me just the string in the big line so the tFileInputRegex, as documented splits the line by regex but not give you complete match. I therefore worked around it by holding the line in a context variable and after get the line on with and without match. So the flow is ..iJavaRow (put line into a context variable) ---> tExtractRegexFields --success---> tJavaRow (get line from a context variable) |
reject ---> tJavaRow (get the line from context variable).
This worked.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If possible please provide source and expected target output.
Regards,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My regex: "sri"
If my string is "sriisnice" I want the success output to be "sriisnice" (not sri) and if the string is "talendhasgreatcommunity" I expect reject output to be "talendhasgreatcommunity". If I try my regular expression ".*sri.*" in any regex testing site it shows full match but I don't get anything!.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you looking for the following output?
Regards,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In regex pattern matching there are different search/match options.
In Java and Talend you need to match the whole string with your regex string like:
- "hahahBOhahaha" doesnt match the regex "BO' but ".+BO.+" does.
I think this is the case with regexMatch function
You need to make sure you match the whole string. Or use a different function,
Look into this documentation:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vboppudi, while I can't see your tJavaRow code I bet you are using regex match there. The problem with that is I can't do 'if else' so I had to use the regex components. I was able to do it with holding the current line in a context variable and extract regex component.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dijke,
I am using the right regex to match the whole string, please look at my earlier reply. I further looked at the code and if there is a match the component gets the first matching group. However this should have been the whole string but I get nothing. However giving the string I am looking for gives me just the string in the big line so the tFileInputRegex, as documented splits the line by regex but not give you complete match. I therefore worked around it by holding the line in a context variable and after get the line on with and without match. So the flow is ..iJavaRow (put line into a context variable) ---> tExtractRegexFields --success---> tJavaRow (get line from a context variable) |
reject ---> tJavaRow (get the line from context variable).
This worked.
