Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Is there a component or library to standardize emails, according to RFC5322, e.g. stripping out comments? For instrance, all these email addresses should be considered the same address, and should be standardized to "bob@domain.com":
bob@domain.com
bob(1)@domain.com
(robert)bob@domain.com
b((o)o)ob@domain.com
b(o)o(o)b@domain.com
Nested comments mean you can't do this with a simple regular expression, it needs a proper descent parser. Unless regex can do that, if so it's beyond my fu.
Hi,
Could you please use the tmatchgroup component to group email ids based on your matching rules?
There are multiple matching algorithms available in this component or you can even create an algorithm of your choice.
The choice of the algorithm depends on your use case and I would suggest you to verify the results from this component for each algorithm to familiarize yourself with each of them.
Warm Regards,
Nikhil Thampi
I can't see anything there that will deal with nested parentheses.
Hi,
You can remove the unnecessary nested parenthesis by a replace function in tmap. Since name will never have them, you can safely remove them and after that apply the algorithms for matching.
If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome 🙂
Warm Regards,
Nikhil Thampi
Ok it looks like my only option is to manually code it in Java. I suppose I could write a loop to keep applying a regex such as \([^\(]*?\), which matches an opening parenthesis followed by the next closing parenthesis that does not have another opening parenthesis before it, until no match is found.
Is there a Talend component that will apply an expression in a loop, or do I just hard code it in a tJavaRow?
Hi,
Since you are having lot of data related issues, the easiest way to add them will be in tjavarow.
Warm Regards,
Nikhil Thampi