Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
SAITEJA
Contributor II
Contributor II

HTML Tags

Hello,

I have qvd file in which one field consists various html tags how to remove them so that we can clean data

1 Solution

Accepted Solutions
QlikTom
Employee
Employee

Sorry for the delay, I just now was able to get back to the community. 

I took your sample data and used this script:
*I stripped out carriage returns, and replaced them with a space character, otherwise the data will not render in a table. 
simply remove the replace function if you want to maintain them and have other plans for displace. 

If you are receiving any kind of error server, I recommend you contact support and they should be able to diagnose the issue. 

SampleData_Temp:
LOAD
    "Field",
    ID
FROM [lib://DataFolder/ExcelData/Community/comm71215.xls]
(biff, embedded labels, table is Sheet1$);


//load all possible HTML tags that exist in field, set replacement as ''
TagMap_Temp:
LOAD DISTINCT 
	'<' & TextBetween('<' & SubField("Field", '<'),'<','>') & '>' as TagMatch
    , '' as Replacement
Resident SampleData_Temp;


//concatenate hand type html entities into temp table (This would happen automatically without concatinate syntax)
//This inline table can be used to replace any HTML entities with their unicode counterpart if required. 
Concatenate (TagMap_Temp)
LOAD 
	* 
    INLINE [
TagMatch,	Replacement
'&#58;',	':'
];

//create a mapping table of all potential tags, and their replacements from the temp table
TagMap:
MAPPING LOAD 
	* 
Resident TagMap_Temp;

//drop TagMap_Temp, as it is no longer needed
Drop Table TagMap_Temp;


//NoConcatenate prevents automatic contatination with source table
NoConcatenate
//use MapSubString Funciton to remove all possible tags
FinalResult:
load 
	Replace(MapSubString('TagMap',"Field"), Chr(10), ' ') as "Field",
    ID
resident SampleData_Temp;

drop table SampleData_Temp;

Data load progress

qliktom_0-1591108780495.png

Result in a table.

qliktom_1-1591109053556.png

 

 

 

 

View solution in original post

13 Replies
raji6763
Creator II
Creator II

hi SAITEJA,

you can remove the field from Qvd file.

try this:

LOAD * FROM YourQvd.qvd (qvd);

DROP FIELD HtmltagField;

 

regards,

raji

SAITEJA
Contributor II
Contributor II
Author

Can we do in this manner

Load

Field 

From qvd:

Drop Htmltag field

 

Or else you want me to create Htmltag field or how I am confused here

raji6763
Creator II
Creator II

if you confused, comment the  html tag field when  store as a QVD.

 

raji6763_0-1589366636198.png

 

SAITEJA
Contributor II
Contributor II
Author

Actually I want to clear the all the html tags in a table tried using Mapsubstring shows the error larger amount data,so what can be the alternative so that there will less data load time?

QlikTom
Employee
Employee

Hello, take a look at this solution, which will remove standard tags like <a href></a> and translate HTML entities to their proper form.

https://community.qlik.com/t5/New-to-Qlik-Sense/Deleting-text-between-characters/m-p/1685803#M160422

Unfortunately, I am not familiar with the "larger amount data" error. 

SAITEJA
Contributor II
Contributor II
Author

Thank you, I will try it and let you know😀

SAITEJA
Contributor II
Contributor II
Author

Hello,

I have tried the code but got the general description error.

 

QlikTom
Employee
Employee

I am not familiar with the "general description error"

This forum will actually allow you to upload screenshots and sample data files.

I would say the best hope for a solution would be to upload a sample of the data that you are working with (or an artificial version with the same basic properties) in excel or CSV. This way community members can verify proposed solutions work. 


SAITEJA
Contributor II
Contributor II
Author

I have attached the excel of similar kind of data where field different type of data