Regex to get middle of a string with known charact... - Qlik Community

Anonymous · ‎2013-12-09

Hi, can I use a regex in tMap to get only what's in the .* portion of this string?
( Specific Text: ).*()
If not, what's the best way to go about this?
Thank You

Anonymous · ‎2013-12-09

I would use a routine like this:
http://www.talendforge.org/exchange/index.php?eid=1054&product=tos&action=view&nav=1,1,1
I would define the portion of text you want to get in a regex group and everything surrounding outside the group.

Anonymous · ‎2013-12-09

Hi and thanks for sharing this
EDIT: I see now that this seems to be a user routine, I got it installed

Anonymous · ‎2013-12-09

Would you give me an example of how to use the extractByRegexGroup expression given the string in my original question?

Anonymous · ‎2013-12-10

This is the regex you need: Specific Text: (.*)
as index for the group you have to extract use 1 (regex groups starts with 1)

Anonymous · ‎2013-12-10

Hi, using this in a tMap I am attempting the following:
RegexUtil.extractByRegexGroup(MyTable.MyField," Purchase Timeframe: (.*)",1)
But nothing is coming through into the table

Anonymous · ‎2013-12-11

Could you please check one of your datasets with a regex test tool ? As you see in my picture the regex works.
This routine works in my projects for a couple of years and I am absolute sure the problem are your data or your job.

Anonymous · ‎2013-12-11

Hi thanks for sticking with me, my regex was wrong it should have been:
" Purchase Timeframe: (.*) "
Not:
" Purchase Timeframe: (.*)"

Anonymous · ‎2013-12-11

Ok I have had partial success, but a good portion of the rows are being rejected due to a Data Truncation error.
Again here is my tMap expression:
RegexUtil.extractByRegexGroup(tablename.fieldname," Purchase Timeframe: (.*) ",1)
From what I can see it looks like for the ones that are making it through properly are the ones in which the Purchase Timeframe value is actually the end of the field.
So for example, a field like this:
".....contentcontentcontent.... Purchase Timeframe: One Month "
Gets through and appears perfectly in the target table as:
One Month
But a field like this:
".....contentcontentcontent.... Purchase Timeframe: One Week Will Finance Purchase: Yes I Have a Trade-in: No "
Fails and my tLogRow rejects output looks like this for the row:
One Week Will Finance Purchase: Yes I Have a Trade-in: No||||Data truncation: Data too long for column 'BuyBy' at row 1 - Line: 290
The fact that it's the end of the field for the ones that made it through is the only difference I can perceive so far
I will keep looking to see if I can find any other difference
Thank You

Anonymous · ‎2013-12-11

I think I have some valid results by adding a ? to the regex:
RegexUtil.extractByRegexGroup(tablename.fieldname," Purchase Timeframe: (.*?) ",1)
I am now no longer getting any Data Truncation errors

Regex to get middle of a string with known character boundaries

Talend Data Integration

v5.x