Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!

How to Efficiently Clean HTML Tags from Your Data Source

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TylerR
Partner - Contributor II
Partner - Contributor II

How to Efficiently Clean HTML Tags from Your Data Source

Last Update:

Mar 27, 2020 3:48:14 PM

Updated By:

TylerR

Created date:

Mar 27, 2020 3:48:14 PM

When working with web APIs, one of the problems that we consistently run into is on fields where the user can input into a free-from text box. These fields often flow into Qlik filled with HTML tags, making them difficult to read and larger in size to store. This example demonstrates how we cleaned a  "Comment" field, where the user can enter anything into an input box at the source. Here's a sample comment that needs to be cleaned as it would appear from the API.

<div style="" ><div><p>:Tom went to the store to buy apples.<br>There were only 2 apples remaining.<br>So he bought the last 2 apples.</p>
</div>
</div>

Hopefully the following code will save you some time and hassle trying to clean your data. In short, we load in the Comment table, run a "For" loop through each comment, and within each comment for loop, run a "Do" loop through the HTML tags and clean them for readability. We then populate the ID field (to join back to the original table) and the cleaned comment into a second table which can be joined back once the table is complete.

 

Comments:
LOAD
Comment_ID,
Comment
FROM [lib://QVD Folder Connection/Comments.qvd]
(qvd);

For vRow = 1 to NoOfRows('Comments')
Let vID = Peek('Comment_ID' , vRow-1 , 'Comments');//get the ID of the next record to parse
Let vtext = Peek('Comment' , vRow-1 , 'Comments');//get the original comment of the next record

do while len(TextBetween(vtext, '<', '>'))>0//loop through each comment in place
vtext=Replace(vtext,'<br>',chr(10));//replace line breaks with carriage returns - improves legibility
vtext=Replace(vtext, '<'&TextBetween(vtext, '<', '>')&'>', '');//find groups with <> and replace them with ''


loop;

temp:
load
Comment_ID,
'$(vtext)' as cleancomment
Resident Comments
Where Comment_ID='$(vID)';

next vRow;

left join (Comments)
load *
resident temp;
drop table temp;

 

As a result, our comment now reads as follows:

Tom went to the store to buy apples.

There were only 2 apples remaining.

So he bought the last 2 apples.

 

Happy cleaning!

 

Contributors
Version history
Last update:
‎2020-03-27 03:48 PM
Updated by: