Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Strip HTML

Hi guys,

I have a field in SQL with HTML tags.

Is there a way to remove these?

<div>Twin needle stitching detailing. Double fabric hood with self coloured cords. Kangaroo pouch pocket with small hidden opening for earphone cord feed. Ribbed cuffs and hem. Worldwide Responsible Accredited Production (WRAP) certified production. 50 great colours. Soft cotton faced fabric creates ideal printing surface. <br />

<br />

<b>Washing Instructions:</b><br />

Machine wash 30&deg;. Do not bleach. Tumble dry low heat. Low iron. Do not dry clean</div>

Iconic school sweatshirt. Twin needle coverseams on neck. Raglan sleeve cuffs and waistband for extra durability. Herringbone back neck tape. Spotshield stain resistant coating for fabric protection. Fade resistant. <br > <br><b>Washing Instructions:</b><br> Suitable for 40°C wash and tumble dry <br>

I'd also like to remove &deg;

These are just a couple of examples!!

Thanks

10 Replies
Anonymous
Not applicable
Author

I've also discovered a field like this!

<div><b>

<ul style="list-style-type:disc">

    <li>Tubular construction for stability</li>

    <li>1x1 rib trim with elastane</li>

    <li>Herringbone neck tape</li>

</ul>

</b> <br />

<b>Fabric:</b><br />

50% Combed ringspun cotton, 50% Polyester <br />

<br />

<b>Weight:</b><br />

295gsm</div>

Is this going to be impossible? Haha

marcus_sommer

Have a look on the qv table-wizard which could natively read html and provided a preview and within the extended are options to read the attribut-values, too.

- Marcus

Anonymous
Not applicable
Author

Hi Marcus,

What if the data is stored in a SQL table.  Can I still do this? If so, how?

Many thanks!

jonathandienst
Partner - Champion III
Partner - Champion III

The way I see it, you have two options (if you want to avoid messy string mangling of the text):

  • Read the database and output the XML as a text file, then load the text file.
  • Read the database to get the XML into a single field, then use LOAD ... FROMFIELD to read the XML

In both cases, you will find it easier to develop the XML load if you manually create a file containing XML in the same schema. Once that load is working, you can use the generated script as a template. Qlikview requires the XML to be well-formed, so partial tags and any constructs not complying with XML rules will prevent the data from loading.

FROMFIELD is a little used feature and there is almost no documentation and very little even here on the community. I have managed to get that working once, but it was some time ago and I no longer have that code. Perhaps someone reading this can offer some asistance.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
jonathandienst
Partner - Champion III
Partner - Champion III

Another option came to mind after posting the above. VBScript supports regular expressions, and you can call VBScript functions from the load script. You could then parse the text with regular expressions, which will make for much simpler string handling.

Calling vbscript functions is a performance killer, though, so I would only consider this if the size of your data set is not large.

Search the commuity for vbscript and regular expressions / regex for the code.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
marcus_sommer

This might be helpful: https://www.qlik2go.net/en/2014/08/08/b/

- Marcus

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

See Qlikview Cookbook: Strip HTML From Text http://qlikviewcookbook.com/recipes/download-info/strip-html-from-text/

for a working example of using Regular Expressions to remove html tags.

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

Anonymous
Not applicable
Author

Thanks for your advice guys.  I actually found this in on another thread which seems to work really well !

Data:

first 200

LOAD

  RecNo() as RecNo,

  SKU,

  Description_IncHTML

From

  Data\nop_product_data.qvd (qvd)

where

  not wildmatch(SKU,'ML*')

;

HtmlTag_Map:

MAPPING LOAD DISTINCT '<' & TextBetween('<' & SubField(PurgeChar(Description_IncHTML, chr(13)&chr(10)), '<', IterNo()),'<','>') & '>' as HtmlTag, '' as Substitute

Resident Data

While NOT IsNull(SubField(Description_IncHTML, '<', IterNo()))

;

JOIN(Data) LOAD RecNo, MapSubstring('HtmlTag_Map', Description_IncHTML) as Field_Cleansed

Resident Data

;

I am however still seeing carriage returns

Warm and comfortable fleece jacket. Shaped to fit. Full zip front with cadet collar. Two zipped closed front pockets. Two interior pockets. Adjustable base drawcord. Easy re-label system.

Washing Instructions:

Machine wash 40&deg;. Gentle setting. Do not bleach. Iron low. Do not tumble dry. Do not dry clean

Is there a way to remove these?  I'd also like to remove the "&deg;" too !

Anonymous
Not applicable
Author

Thanks for all your help guys.  I have found a solution here (if anyone is interested)

Data loading with extra characters - need to trim

This worked perfectly for me !