Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Regex Hell help

For some odd reason, it was decided to keep the dates found in electronic contracts embedded in the text of the digital contract, found in a text field. Of course, if the contract is in a different language, you gotta code for that as well. 🙂 Better yet, said date follows this format:

<td width="20%">
Application Date:<br/>
Funding Date:<br/>
Due Date: <br/>
</td>
<td>
March 17, 2005 <br/>
March 21, 2005<br/>
April 11, 2005<br/>
</td>

I need to parse out the correct Due Date (in this instance, it's 4/11/2005) somehow.

Tried regular expressions and, as usual, failed to get that engine to do what I want...

select substring(digital_contract_text from '%Due Date:#"(\f|\n|\r| |\t|\v)*\<br\/\>(\f|\n|\r| |\t|\v)*\<\/td\>(\f|\n|\r| |\t|\v)*\<td\>(\f|\n|\r| |\t|\v)*([a-z]| |[A-Z]|,|[0-9])+\<br\/\>(\f|\n|\r| |\t|\v)*([a-z]| |[A-Z]|,|[0-9])+\<br\/\>(\f|\n|\r| |\t|\v)*([a-z]| |[A-Z]|,|[0-9])+#"\<br\/\>%' for '#')
from table_foo
where id = 370

Hilariously, if one moves the initial #" anywhere else than where it is already, the query blows up.

How do I go about parsing that DueDate out of that HTML mess?

1 Solution

Accepted Solutions
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Try using function subfield. See attached sample.

Oleg

Ask me about Qlik Sense Expert Class!

View solution in original post

3 Replies
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Try using function subfield. See attached sample.

Oleg

Ask me about Qlik Sense Expert Class!
Not applicable
Author

Sweet. Wish SQL had this...

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

TextBetween() might also be a good approach.

trim(TextBetween(vOriginal, '<br/>', '<', 5));

-Rob