Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I've got a webquery that generates an XML document in the browser. I'm using this as a web file data source in QlikView, and it works as expected, pulling in the XML schema and data. However, one of the fields is full of HTML markup, and I'm not sure the best way to get it all out. Since the XML is generated dynamically on an internet site, it never hits the server file system and so I can't do anything on that end. Also, the HTML is pretty extensive, with lots of things like:
<TD STYLE="BORDER-BOTTOM: black 0.5pt solid; BORDER-LEFT: black 0.5pt solid; BACKGROUND-COLOR: white; WIDTH: 208pt; HEIGHT: 12.75pt;">
So it's not as simple as a few replace statements to strip <p> and </p>.
Any ideas?
Thanks, this is very helpful. Do you know how to have it turn the <br> into a CR/LF? I want to strip out the tags but not lose some basic formatting.
I've used a nice extension (MinimalisticHtmlTextBox) that works well but only in Full Browser Mode. I'd like to pull data from SharePoint that works for users who prefer the IE Plugin.
You could replace '<br>' with '\n' before stripping all other HTML tags.
This is brilliant cheers Ralf
Step 1: In your extract QVW, add the below VB code under tools Edit module,
change Requested module security to system access and allow system access
'Strips the HTML tags from strHTML
Dim objRegExp, strOutput
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<(.|\n)+?>"
'Replace all HTML tag matches with the empty string
strOutput = objRegExp.Replace(strHTML, "")
'Replace all < and > with < and >
strOutput = Replace(strOutput, "<", "<")
strOutput = Replace(strOutput, ">", ">")
stripHTML = strOutput 'Return the value of strOutput
Set objRegExp = Nothing
End Function
Step 2: in edit script, after the field
replace(replace(stripHTML([content/properties/Your filed name])
,':',':')
,' ',' ') as newcleanfiledname,
What would be the code for fields that come from a database?
Ralf,
Have you run into a situation where there are just too many values in your HTML_Tag_Map table?
The code works fine for the first 70 records I load - which correlates to 118 lines fetched, but then after that, the script just fails for apparently no reason.
Melisa
Melisa, can you attach an HTML file here to illustrate?
Ralf,
Thanks for reaching out. It wasn’t the number of records. There was actually some sort of corruption on record 71.
Melisa
This is great, thanks! Can you do a replace() after the stripHTML function?
I have to replace "_" and .
You can use this?
stripHTML_Rep = Replace(stripHTML, "_", ".")