Qlik Community

Qlik Sense Documents

Qlik Sense documentation and resources.

Announcements
QlikWorld online is next week! REGISTER NOW

How to Efficiently Clean HTML Tags from Your Data Source

cancel
Showing results for 
Search instead for 
Did you mean: 
TylerR
Partner
Partner

How to Efficiently Clean HTML Tags from Your Data Source

When working with web APIs, one of the problems that we consistently run into is on fields where the user can input into a free-from text box. These fields often flow into Qlik filled with HTML tags, making them difficult to read and larger in size to store. This example demonstrates how we cleaned a  "Comment" field, where the user can enter anything into an input box at the source. Here's a sample comment that needs to be cleaned as it would appear from the API.

<div style="" ><div><p>:Tom went to the store to buy apples.<br>There were only 2 apples remaining.<br>So he bought the last 2 apples.</p>
</div>
</div>

Hopefully the following code will save you some time and hassle trying to clean your data. In short, we load in the Comment table, run a "For" loop through each comment, and within each comment for loop, run a "Do" loop through the HTML tags and clean them for readability. We then populate the ID field (to join back to the original table) and the cleaned comment into a second table which can be joined back once the table is complete.

 

Comments:
LOAD
Comment_ID,
Comment
FROM [lib://QVD Folder Connection/Comments.qvd]
(qvd);

For vRow = 1 to NoOfRows('Comments')
Let vID = Peek('Comment_ID' , vRow-1 , 'Comments');//get the ID of the next record to parse
Let vtext = Peek('Comment' , vRow-1 , 'Comments');//get the original comment of the next record

do while len(TextBetween(vtext, '<', '>'))>0//loop through each comment in place
vtext=Replace(vtext,'<br>',chr(10));//replace line breaks with carriage returns - improves legibility
vtext=Replace(vtext, '<'&TextBetween(vtext, '<', '>')&'>', '');//find groups with <> and replace them with ''


loop;

temp:
load
Comment_ID,
'$(vtext)' as cleancomment
Resident Comments
Where Comment_ID='$(vID)';

next vRow;

left join (Comments)
load *
resident temp;
drop table temp;

 

As a result, our comment now reads as follows:

Tom went to the store to buy apples.

There were only 2 apples remaining.

So he bought the last 2 apples.

 

Happy cleaning!

 

Version history
Last update:
‎2020-03-27 03:48 PM
Updated by:
Contributors