Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Discover how organizations are unlocking new revenue streams: Watch here
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Regex to get middle of a string with known character boundaries

Hi, can I use a regex in tMap to get only what's in the .* portion of this string?
(<br>Specific Text: ).*(</br>)
If not, what's the best way to go about this?
Thank You 0683p000009MACn.png
Labels (2)
10 Replies
Anonymous
Not applicable
Author

I would use a routine like this:
http://www.talendforge.org/exchange/index.php?eid=1054&product=tos&action=view&nav=1,1,1
I would define the portion of text you want to get in a regex group and everything surrounding outside the group.
Anonymous
Not applicable
Author

Hi and thanks for sharing this
EDIT: I see now that this seems to be a user routine, I got it installed 0683p000009MACn.png
Anonymous
Not applicable
Author

Would you give me an example of how to use the extractByRegexGroup expression given the string in my original question?
Anonymous
Not applicable
Author

This is the regex you need: <br>Specific Text: (.*)</br>
as index for the group you have to extract use 1 (regex groups starts with 1)
0683p000009MC8J.png
Anonymous
Not applicable
Author

Hi, using this in a tMap I am attempting the following:
RegexUtil.extractByRegexGroup(MyTable.MyField,"<br>Purchase Timeframe: (.*)</br>",1)
But nothing is coming through into the table
Anonymous
Not applicable
Author

Could you please check one of your datasets with a regex test tool ? As you see in my picture the regex works.
This routine works in my projects for a couple of years and I am absolute sure the problem are your data or your job.
Anonymous
Not applicable
Author

Hi thanks for sticking with me, my regex was wrong it should have been:
"<br>Purchase Timeframe: (.*)<br>"
Not:
"<br>Purchase Timeframe: (.*)</br>"
Anonymous
Not applicable
Author

Ok I have had partial success, but a good portion of the rows are being rejected due to a Data Truncation error.
Again here is my tMap expression:
RegexUtil.extractByRegexGroup(tablename.fieldname,"<br>Purchase Timeframe: (.*)<br>",1)
From what I can see it looks like for the ones that are making it through properly are the ones in which the Purchase Timeframe value is actually the end of the field.
So for example, a field like this:
".....contentcontentcontent.... <br>Purchase Timeframe: One Month<br>"
Gets through and appears perfectly in the target table as:
One Month
But a field like this:
".....contentcontentcontent.... <br>Purchase Timeframe: One Week<br>Will Finance Purchase: Yes<br>I Have a Trade-in: No<br>"
Fails and my tLogRow rejects output looks like this for the row:
One Week<br>Will Finance Purchase: Yes<br>I Have a Trade-in: No||||Data truncation: Data too long for column 'BuyBy' at row 1 - Line: 290
The fact that it's the end of the field for the ones that made it through is the only difference I can perceive so far
I will keep looking to see if I can find any other difference
Thank You 0683p000009MACn.png
Anonymous
Not applicable
Author

I think I have some valid results by adding a ? to the regex:
RegexUtil.extractByRegexGroup(tablename.fieldname,"<br>Purchase Timeframe: (.*?)<br>",1)
I am now no longer getting any Data Truncation errors