How to Efficiently Clean HTML Tags from Your Data Source

TylerR · Mar 27, 2020 3:48:14 PM

When working with web APIs, one of the problems that we consistently run into is on fields where the user can input into a free-from text box. These fields often flow into Qlik filled with HTML tags, making them difficult to read and larger in size to store. This example demonstrates how we cleaned a "Comment" field, where the user can enter anything into an input box at the source. Here's a sample comment that needs to be cleaned as it would appear from the API.

<div style="" ><div><p>:Tom went to the store to buy apples.<br>There were only 2 apples remaining.<br>So he bought the last 2 apples.</p>
</div>
</div>

Hopefully the following code will save you some time and hassle trying to clean your data. In short, we load in the Comment table, run a "For" loop through each comment, and within each comment for loop, run a "Do" loop through the HTML tags and clean them for readability. We then populate the ID field (to join back to the original table) and the cleaned comment into a second table which can be joined back once the table is complete.

Comments:
LOAD
Comment_ID,
Comment
FROM [lib://QVD Folder Connection/Comments.qvd]
(qvd);

For vRow = 1 to NoOfRows('Comments')
Let vID = Peek('Comment_ID' , vRow-1 , 'Comments');//get the ID of the next record to parse
Let vtext = Peek('Comment' , vRow-1 , 'Comments');//get the original comment of the next record

do while len(TextBetween(vtext, '<', '>'))>0//loop through each comment in place
vtext=Replace(vtext,'<br>',chr(10));//replace line breaks with carriage returns - improves legibility
vtext=Replace(vtext, '<'&TextBetween(vtext, '<', '>')&'>', '');//find groups with <> and replace them with ''

loop;

temp:
load
Comment_ID,
'$(vtext)' as cleancomment
Resident Comments
Where Comment_ID='$(vID)';

next vRow;

left join (Comments)
load *
resident temp;
drop table temp;

As a result, our comment now reads as follows:

Tom went to the store to buy apples.

There were only 2 apples remaining.

So he bought the last 2 apples.

Happy cleaning!

How to Efficiently Clean HTML Tags from Your Data Source

How to Efficiently Clean HTML Tags from Your Data Source

API

HTML

Looping variable script

Qlik Sense

Delta Tags – A new mechanism for efficiently handl...

Conditional Performance Indicators in Tables using...

Passing field selections in Qliksense Mashup throu...

How to Develop Qlik Sense Extensions from D3.js: T...

Setting up a Esri ArcGIS WFS source as a layer in ...