When working with web APIs, one of the problems that we consistently run into is on fields where the user can input into a free-from text box. These fields often flow into Qlik filled with HTML tags, making them difficult to read and larger in size to store. This example demonstrates how we cleaned a "Comment" field, where the user can enter anything into an input box at the source. Here's a sample comment that needs to be cleaned as it would appear from the API.
<div style="" ><div><p>:Tom went to the store to buy apples.<br>There were only 2 apples remaining.<br>So he bought the last 2 apples.</p> </div> </div>
Hopefully the following code will save you some time and hassle trying to clean your data. In short, we load in the Comment table, run a "For" loop through each comment, and within each comment for loop, run a "Do" loop through the HTML tags and clean them for readability. We then populate the ID field (to join back to the original table) and the cleaned comment into a second table which can be joined back once the table is complete.
Comments: LOAD Comment_ID, Comment FROM [lib://QVD Folder Connection/Comments.qvd] (qvd);
For vRow = 1 to NoOfRows('Comments') Let vID = Peek('Comment_ID' , vRow-1 , 'Comments');//get the ID of the next record to parse Let vtext = Peek('Comment' , vRow-1 , 'Comments');//get the original comment of the next record
do while len(TextBetween(vtext, '<', '>'))>0//loop through each comment in place vtext=Replace(vtext,'<br>',chr(10));//replace line breaks with carriage returns - improves legibility vtext=Replace(vtext, '<'&TextBetween(vtext, '<', '>')&'>', '');//find groups with <> and replace them with ''
loop;
temp: load Comment_ID, '$(vtext)' as cleancomment Resident Comments Where Comment_ID='$(vID)';
next vRow;
left join (Comments) load * resident temp; drop table temp;