[resolved] How to extract substrings according to regex pattern in tMap
Hi,
i am new to Talend, and I trouble my mind how to perform Regex string manipulations in tMap.
Situation is: I have a column that has some date information embedded in text like this:
"AC/DC tickets for 20/12/2010"
Here i want to extract the date. My first approach was to utilise tExtraxctRegExFields, but since "cylces" in data flow are not supportet i dont see a way to rejoin this column with the rest of the dataset once it is split up.
I found a regex pattern
"({2}/{2}/{4})"
identifying the date, but here is my question: What is the correct java statement / code i have to create in tMap expression builder for this column? I tried Pattern.compile() but could not find a valid and working construction.
The source column holds the whole string, destination column should be stripped down to the date as a substring according to the regex pattern.
I use TOS 4.0.
Any help is appreciated.
Thanks
dexter
Hi eguerin,
thanks for the quick reply and the improved regex
tExtractRegexFields isnt the best choice for me here, since it can only split up one column at a time as i see, and i have more than 1 column that needs some string cleansing in the data flow.
I have figured out that there are posts dealing with similar problems:
https://community.talend.com/t5/Design-and-Development/resolved-Using-TMap-to-parse-complex-string/t... It suggests RegEx String manipulations can be done in tMap which would be very elegant. But how exactly is the java statement to extract a substring from a given input string using a defined regex pattern? Should i write into code directly or can i implement it in the tmap expression builder?
Thanks
dexter
Hi,
You can use the tExtractRegexFields component with this pattern : "^(.+)({2})/({2})/({4})$"
On your output shema you have just to declare 4 columns :
- text
- day
- month
- year
And that's all.
Hi eguerin,
thanks for the quick reply and the improved regex
tExtractRegexFields isnt the best choice for me here, since it can only split up one column at a time as i see, and i have more than 1 column that needs some string cleansing in the data flow.
I have figured out that there are posts dealing with similar problems:
https://community.talend.com/t5/Design-and-Development/resolved-Using-TMap-to-parse-complex-string/t... It suggests RegEx String manipulations can be done in tMap which would be very elegant. But how exactly is the java statement to extract a substring from a given input string using a defined regex pattern? Should i write into code directly or can i implement it in the tmap expression builder?
Thanks
dexter
Ok, you can do this with a routine (into the menu Code > Routines).
You have to create a new routine and develop your code in Java.
After this, you're able to reuse your routine into de expression builder (into the tMap) like this :
routines.yourRoutineName.yourMethodName(param1, paramx)
It's very easy to create a new routine.