Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
ali_hijazi
Partner - Master II
Partner - Master II

strip HTML codes from a field during load script

Hello

I have a field in a table that contains HTML tags
I want to strip these HTML tags from the field during load script

How can I accomplish this?

I can walk on water when it freezes
2 Solutions

Accepted Solutions
Zhandos_Shotan
Partner - Creator II
Partner - Creator II

Hi!

Here it is:

tab1:
load * inline
[id, html
1,'<tag1>text1<tag2>text2</tag2></tag1><tag3>text3<tag3>'
2, '<tag4>text4<tag5>text5</tag5><tag6>text6</tag6></tag4>'
];

 

For vRow = 1 to NoOfRows('tab1')
Let vID = Peek('id',vRow-1,'tab1');
Let vtext = Peek('html',vRow-1,'tab1');

do while len(TextBetween(vtext, '<', '>'))>0
vtext=Replace(vtext, TextBetween(vtext, '<', '>'), '');
vtext=Replace(vtext, '<>', '');
loop;

tab2:
load
id,
html,
'$(vtext)' as text
Resident tab1
Where id=$(vID);

next vRow;

drop Table tab1;

 

Happy New Year!

View solution in original post

Zhandos_Shotan
Partner - Creator II
Partner - Creator II

Hi!

Table name not allowed in join load script:

Left Join(CASES_PEGA)
temp:
Load Distinct
CASE_ID
,MapSubstring('HtmlTag_Map', PYDESCRIPTION) as Field_Cleansed
Resident CASES_PEGA;

View solution in original post

7 Replies
Zhandos_Shotan
Partner - Creator II
Partner - Creator II

Hi!

Use this functions TextBetween, Replace and PurgeChar in iterative loop for field value

Here is my solution:

1. Find tags:

TextBetween('<tag>text', '<', '>')  

Returns 'tag'. Use it as search text  in replace function below.

2. Replace tags to nothing:

Replace('<tag>text', TextBetween('<tag>text', '<', '>'), '')

Returns '<>text'

3. Remove all <, > chars

PurgeChar ( '<>text', '<>' )  Returns text

ali_hijazi
Partner - Master II
Partner - Master II
Author

what did you mean by "in iterative loop for the field value"

can you give me sample script?

I can walk on water when it freezes
Zhandos_Shotan
Partner - Creator II
Partner - Creator II

Hi!

Here it is:

tab1:
load * inline
[id, html
1,'<tag1>text1<tag2>text2</tag2></tag1><tag3>text3<tag3>'
2, '<tag4>text4<tag5>text5</tag5><tag6>text6</tag6></tag4>'
];

 

For vRow = 1 to NoOfRows('tab1')
Let vID = Peek('id',vRow-1,'tab1');
Let vtext = Peek('html',vRow-1,'tab1');

do while len(TextBetween(vtext, '<', '>'))>0
vtext=Replace(vtext, TextBetween(vtext, '<', '>'), '');
vtext=Replace(vtext, '<>', '');
loop;

tab2:
load
id,
html,
'$(vtext)' as text
Resident tab1
Where id=$(vID);

next vRow;

drop Table tab1;

 

Happy New Year!

ali_hijazi
Partner - Master II
Partner - Master II
Author

thank you for your prompt reply

but the thing is that my table is large and adding a loop to the script makes reload takes a lot of time

isn't there a quicker approach?

I tried this approach but I'm getting a General Script Error:

HtmlTag_Map:Mapping
LOAD DISTINCT
'<' & TextBetween('<' & SubField(PYDESCRIPTION, '<', IterNo()),'<','>') & '>' as HtmlTag,
'' as Substitute
Resident CASES_PEGA
While NOT IsNull(SubField(PYDESCRIPTION, '<', IterNo()));

Left Join(CASES_PEGA)
temp:
Load Distinct
CASE_ID
,MapSubstring('HtmlTag_Map', PYDESCRIPTION) as Field_Cleansed
Resident CASES_PEGA;

I can walk on water when it freezes
Zhandos_Shotan
Partner - Creator II
Partner - Creator II

Hi!

Table name not allowed in join load script:

Left Join(CASES_PEGA)
temp:
Load Distinct
CASE_ID
,MapSubstring('HtmlTag_Map', PYDESCRIPTION) as Field_Cleansed
Resident CASES_PEGA;

sudhakar_budde
Creator
Creator

Hi,

 

Thanks for your code to strip the html from a field.

This is working as expected. But unfortunately, it is taking 1 second per record to load the data!

As I have more than 50000 records it is going to taking hours to complete the load. 

Is there any quick solution for this please?

To avoid looping, I have tried the below code as well this is not working either!

tab1:

LOAD RecNo() as RecNo,
Html_Field
,Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Html_Field, TextBetween(Html_Field, '<', '>'), ''), '<>', ''),TextBetween(Html_Field, '{', '}'),''),'{}',''),'&ns;',''), '&nsp;',''),'v\:*',''),'o\:*', ''),'w\:*',''),'..shape','') as Cleaned_Field

FROM [xyz.qvd] (qvd);

I also tried the code in the below link, this is not working either!

https://community.qlik.com/t5/QlikView-App-Development/Removing-HTML-markup-code/td-p/466050

 

Is there any other solution please?

 

Thanks 

SB

 

 

sebastian_plosarek
Contributor
Contributor

Worked for me, thank you.