Removing HTML markup code - Page 4 - Qlik Community

mhassinger · ‎2013-01-28

I've got a webquery that generates an XML document in the browser. I'm using this as a web file data source in QlikView, and it works as expected, pulling in the XML schema and data. However, one of the fields is full of HTML markup, and I'm not sure the best way to get it all out. Since the XML is generated dynamically on an internet site, it never hits the server file system and so I can't do anything on that end. Also, the HTML is pretty extensive, with lots of things like:

So it's not as simple as a few replace statements to strip <p> and </p>.

Any ideas?

chasafd · ‎2015-05-22

Thanks, this is very helpful. Do you know how to have it turn the <br> into a CR/LF? I want to strip out the tags but not lose some basic formatting.

I've used a nice extension (MinimalisticHtmlTextBox) that works well but only in Full Browser Mode. I'd like to pull data from SharePoint that works for users who prefer the IE Plugin.

rbecher · ‎2015-05-23

You could replace '<br>' with '\n' before stripping all other HTML tags.

Data & AI Engineer at Orionbelt.ai - a GenAI Semantic Layer Venture, Inventor of Astrato Engine

bmesolutions · ‎2015-11-19

This is brilliant cheers Ralf

everest226 · ‎2016-01-22

Step 1: In your extract QVW, add the below VB code under tools Edit module,

change Requested module security to system access and allow system access

Function stripHTML(strHTML)

'Strips the HTML tags from strHTML

Dim objRegExp, strOutput

Set objRegExp = New Regexp

objRegExp.IgnoreCase = True

objRegExp.Global = True

objRegExp.Pattern = "<(.|\n)+?>"

'Replace all HTML tag matches with the empty string

strOutput = objRegExp.Replace(strHTML, "")

'Replace all < and > with < and >

strOutput = Replace(strOutput, "<", "<")

strOutput = Replace(strOutput, ">", ">")

stripHTML = strOutput 'Return the value of strOutput

Set objRegExp = Nothing

End Function

Step 2: in edit script, after the field

replace(replace(stripHTML([content/properties/Your filed name])

,':',':')

,' ',' ') as newcleanfiledname,

Anonymous · ‎2016-07-27

What would be the code for fields that come from a database?

Report Inappropriate Content · ‎2016-09-23

Ralf,

Have you run into a situation where there are just too many values in your HTML_Tag_Map table?

The code works fine for the first 70 records I load - which correlates to 118 lines fetched, but then after that, the script just fails for apparently no reason.

Melisa

rbecher · ‎2016-09-25

Melisa, can you attach an HTML file here to illustrate?

Data & AI Engineer at Orionbelt.ai - a GenAI Semantic Layer Venture, Inventor of Astrato Engine

Report Inappropriate Content · ‎2016-09-25

Ralf,

Thanks for reaching out. It wasn’t the number of records. There was actually some sort of corruption on record 71.

Melisa

cbaqir · ‎2017-09-19

This is great, thanks! Can you do a replace() after the stripHTML function?

I have to replace "_" and .

Anil_Babu_Samineni · ‎2017-09-19

You can use this?

stripHTML_Rep = Replace(stripHTML, "_", ".")

Best Anil, When applicable please mark the correct/appropriate replies as "solution" (you can mark up to 3 "solutions". Please LIKE threads if the provided solution is helpful