Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
mhassinger
Creator
Creator

Removing HTML markup code

I've got a webquery that generates an XML document in the browser. I'm using this as a web file data source in QlikView, and it works as expected, pulling in the XML schema and data. However, one of the fields is full of HTML markup, and I'm not sure the best way to get it all out. Since the XML is generated dynamically on an internet site, it never hits the server file system and so I can't do anything on that end. Also, the HTML is pretty extensive, with lots of things like:

<TD STYLE="BORDER-BOTTOM: black 0.5pt solid; BORDER-LEFT: black 0.5pt solid; BACKGROUND-COLOR: white; WIDTH: 208pt; HEIGHT: 12.75pt;">

So it's not as simple as a few replace statements to strip <p> and </p>.

Any ideas?

45 Replies
rbecher
MVP
MVP

Ah, so you running partial reload? Then you have to use:

REPLACE MAPPING LOAD ...

Astrato.io Head of R&D
Not applicable

That worked!!!  Thanks for this.  I was going to resort to using Set ErrorMode=0 but I didnt want to suppress all the errors. 

Thanks again!

rbecher
MVP
MVP

I guess this wouldn't help because the Mapping would not be created on partial reload..

Astrato.io Head of R&D
Not applicable

Well, I spoke too soon.  it seemed to have worked before but now the field_cleansed column is empty.

rbecher
MVP
MVP

JOIN(CRMActivity) REPLACE LOAD ?

Astrato.io Head of R&D
Not applicable

Nope that didnt do it. 

When I view the script execution progress it says:

HtmlTag_Map << CRMActivity 0 lines fetched

HtmlTag_Map << CRMActivity 0 lines fetched

Before it wasn't 0.

rbecher
MVP
MVP

Seems to be that table CRMActivity isn't filled? Maybe try this:

CRMActivity:

REPLACE LOAD *;

SQL EXEC sp_getdata '$(vEmail)';

Or the stored procedure returns nothing?

Astrato.io Head of R&D
Not applicable

The CRMActivity table is being filled.  I have a straight table displaying the data and I can see it each time. 

One thing to note is that in table viewer i can see "CRMActivity" along with "CRMActivity-1" and "$Syn 1 Table".  The latter two were, I guess, created with the mapping? 

rbecher
MVP
MVP

This is a different issue. Table gets not replaced or concatenated because of different field structure. It's hard to say from a distance but it could be depend on the JOIN..

Astrato.io Head of R&D
Not applicable

If removed the rest of the fields from the join the field_cleansed column is filled but it takes a really long time.  I guess its because there's so much html in my data. 

JOIN(CRMActivity) REPLACE LOAD MapSubstring('HtmlTag_Map', PurgeChar(description, chr(13)&chr(10))) as Field_Cleansed

Resident CRMActivity;