Skip to main content
Announcements
Qlik Launches Open Lakehouse and advanced agentic AI experience in Qlik Answers! | LEARN MORE

How to Efficiently Clean HTML Tags from Your Data Source

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
TylerR
Partner - Contributor II

How to Efficiently Clean HTML Tags from Your Data Source

Last Update:

Mar 27, 2020 3:48:14 PM

Updated By:

TylerR

Created date:

Mar 27, 2020 3:48:14 PM

When working with web APIs, one of the problems that we consistently run into is on fields where the user can input into a free-from text box. These fields often flow into Qlik filled with HTML tags, making them difficult to read and larger in size to store. This example demonstrates how we cleaned a  "Comment" field, where the user can enter anything into an input box at the source. Here's a sample comment that needs to be cleaned as it would appear from the API.

<div style="" ><div><p>:Tom went to the store to buy apples.<br>There were only 2 apples remaining.<br>So he bought the last 2 apples.</p>
</div>
</div>

Hopefully the following code will save you some time and hassle trying to clean your data. In short, we load in the Comment table, run a "For" loop through each comment, and within each comment for loop, run a "Do" loop through the HTML tags and clean them for readability. We then populate the ID field (to join back to the original table) and the cleaned comment into a second table which can be joined back once the table is complete.

 

Comments:
LOAD
Comment_ID,
Comment
FROM [lib://QVD Folder Connection/Comments.qvd]
(qvd);

For vRow = 1 to NoOfRows('Comments')
Let vID = Peek('Comment_ID' , vRow-1 , 'Comments');//get the ID of the next record to parse
Let vtext = Peek('Comment' , vRow-1 , 'Comments');//get the original comment of the next record

do while len(TextBetween(vtext, '<', '>'))>0//loop through each comment in place
vtext=Replace(vtext,'<br>',chr(10));//replace line breaks with carriage returns - improves legibility
vtext=Replace(vtext, '<'&TextBetween(vtext, '<', '>')&'>', '');//find groups with <> and replace them with ''


loop;

temp:
load
Comment_ID,
'$(vtext)' as cleancomment
Resident Comments
Where Comment_ID='$(vID)';

next vRow;

left join (Comments)
load *
resident temp;
drop table temp;

 

As a result, our comment now reads as follows:

Tom went to the store to buy apples.

There were only 2 apples remaining.

So he bought the last 2 apples.

 

Happy cleaning!