Regex to match repeated letters in string using java pattern
Talend version: 4.2.3 / 5.2.0M3
OS: Windows / Mac
I am trying to parse out phone numbers which have repeating characters e.g.
0000, 111111, 99999999, 8888, 22222, 000000000 - basically anything which is repeat
I am using the following RegEx in the tMap
row1.Phone.matches("()\1{3}")?"":row1.Phone
or this
row1.Phone.matches("()\\1{3}")?"":row1.Phone (parse out forward slash for java....
When testing the expression I get this
Exception in thread "Main" java.lang.error : Unresolved compilation errors
This works outside Talend - see 2nd SS.
Any ideas?
Actually, a job is generated behind the scene to handle this record.
As your value to test is not quoted, the generated job contains expression like <b>123.matches(...)</b> which does not fullfil the java syntax
You can try to replace all the occurences of <row1.Phone> by <String.valueOf(row1.Phone)> or even <row1.Phone+""> to avoid compilation problems.
Although this compile problem does not appear if you run the job. it is a bit weird in the test area from the point of view of users.
Another thing is that the method "string.matches(regex)" will only filter the records like "1111" "2222", but not "12222" or "22221". Same results with "Pattern.matches(regex, string)".
So I propose to use the following expression to filter the inputs containing repetition inside:
java.util.regex.Pattern.compile("()\\1{3}").matcher(String.valueOf(row1.Phone)+"").find() ? "" : String.valueOf(row1.Phone)
This can work in the test area too since I added "String.valueOf"
Thanks, no more compilation errors. I see now, very powerful. It's possible to call external Java classes!
Very close now, but not sure all the use cases are covered.
A phone number like this results in null
0800 2222 1234
It's beginning to look like RegEx is not the best answer. What do you think / recommend?
Thanks