Qlik Community

Ask a Question

QlikView Administration

Discussion Board for collaboration on QlikView Management.

Announcements
May 18th 10AM ET, Live Chat, bring your QlikView questions. REGISTER
cancel
Showing results for 
Search instead for 
Did you mean: 
ali_hijazi
Partner
Partner

strip HTML codes from a field during load script

Hello

I have a field in a table that contains HTML tags
I want to strip these HTML tags from the field during load script

How can I accomplish this?

I can walk on water when it freezes
6 Replies
Zhandos_Shotan
Partner
Partner

Hi!

Use this functions TextBetween, Replace and PurgeChar in iterative loop for field value

Here is my solution:

1. Find tags:

TextBetween('<tag>text', '<', '>')  

Returns 'tag'. Use it as search text  in replace function below.

2. Replace tags to nothing:

Replace('<tag>text', TextBetween('<tag>text', '<', '>'), '')

Returns '<>text'

3. Remove all <, > chars

PurgeChar ( '<>text', '<>' )  Returns text

ali_hijazi
Partner
Partner
Author

what did you mean by "in iterative loop for the field value"

can you give me sample script?

I can walk on water when it freezes
Zhandos_Shotan
Partner
Partner

Hi!

Here it is:

tab1:
load * inline
[id, html
1,'<tag1>text1<tag2>text2</tag2></tag1><tag3>text3<tag3>'
2, '<tag4>text4<tag5>text5</tag5><tag6>text6</tag6></tag4>'
];

 

For vRow = 1 to NoOfRows('tab1')
Let vID = Peek('id',vRow-1,'tab1');
Let vtext = Peek('html',vRow-1,'tab1');

do while len(TextBetween(vtext, '<', '>'))>0
vtext=Replace(vtext, TextBetween(vtext, '<', '>'), '');
vtext=Replace(vtext, '<>', '');
loop;

tab2:
load
id,
html,
'$(vtext)' as text
Resident tab1
Where id=$(vID);

next vRow;

drop Table tab1;

 

Happy New Year!

ali_hijazi
Partner
Partner
Author

thank you for your prompt reply

but the thing is that my table is large and adding a loop to the script makes reload takes a lot of time

isn't there a quicker approach?

I tried this approach but I'm getting a General Script Error:

HtmlTag_Map:Mapping
LOAD DISTINCT
'<' & TextBetween('<' & SubField(PYDESCRIPTION, '<', IterNo()),'<','>') & '>' as HtmlTag,
'' as Substitute
Resident CASES_PEGA
While NOT IsNull(SubField(PYDESCRIPTION, '<', IterNo()));

Left Join(CASES_PEGA)
temp:
Load Distinct
CASE_ID
,MapSubstring('HtmlTag_Map', PYDESCRIPTION) as Field_Cleansed
Resident CASES_PEGA;

I can walk on water when it freezes
Zhandos_Shotan
Partner
Partner

Hi!

Table name not allowed in join load script:

Left Join(CASES_PEGA)
temp:
Load Distinct
CASE_ID
,MapSubstring('HtmlTag_Map', PYDESCRIPTION) as Field_Cleansed
Resident CASES_PEGA;

sudhakar_budde
Creator
Creator

Hi,

 

Thanks for your code to strip the html from a field.

This is working as expected. But unfortunately, it is taking 1 second per record to load the data!

As I have more than 50000 records it is going to taking hours to complete the load. 

Is there any quick solution for this please?

To avoid looping, I have tried the below code as well this is not working either!

tab1:

LOAD RecNo() as RecNo,
Html_Field
,Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Html_Field, TextBetween(Html_Field, '<', '>'), ''), '<>', ''),TextBetween(Html_Field, '{', '}'),''),'{}',''),'&ns;',''), '&nsp;',''),'v\:*',''),'o\:*', ''),'w\:*',''),'..shape','') as Cleaned_Field

FROM [xyz.qvd] (qvd);

I also tried the code in the below link, this is not working either!

https://community.qlik.com/t5/QlikView-App-Development/Removing-HTML-markup-code/td-p/466050

 

Is there any other solution please?

 

Thanks 

SB