Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
sahilkain1
Contributor II
Contributor II

Need to extract the text between > and < from the HTML Code

Hi All,

I've a column from coming from DB, which is having data in the form of HTML Code. I need to extract the plain data from the HTML code. One possible approach i found is get the data between > and < brackets using Textbetween fuction, but the do whille loop shown in below link is not working for my code.

https://community.qlik.com/t5/App-Development/Extract-Plain-Text-from-HTML/td-p/1670967

Table 1:

Load ID,
            Text
From Table XYZ;

My Text in below format:

<html><head></head><body><div><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>OK to Pay <?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Indemnity Loss <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Payee: <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Invoice: 5710 fees for <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Amount: $ 2689.82</span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Code: AT <a onclick="OpenNewBrowser( ' ../../Desktop/FileNotes/FNViewAttachment.aspx?UIC=M%3d1%26FileNoteID%3dC1A39398DB6CB7B4%26A%3d13%26ClaimID%3d5AA130AEE2C40E72%26AttachmentID%3dFADE85B765ECD459 ', '', 'width=990,height=641,status,statusbar,scrollbars,resizable,menubar,top=0,left=0');" title="Legal Cor" href="javascript&colon;void(0)"><font color=#0066cc>WCS20200327_09190385.PDF</font></a><o:p></o:p></span></p></div></body></html>

 

Output Expected:
OK to Pay Indemnity Loss Payee Invoice: 5710 fees for Amount: $ 2689.82 Code: AT WCS20200327_09190385.PDF

 

Thanks in Advance

2 Replies
MayilVahanan

HI @sahilkain1 

Try like below

Temp:

Load *, TextBetween(SubField(SubField(Test, 'serif'), 'font'), '>','<') as T1, RowNo() as RowNo;
LOAD Test
FROM
[D:\Qlik\Com\test.xlsx]
(ooxml, embedded labels, table is Sheet1);

Load Test, Concat(T1,' ', RowNo) as T2 Resident Temp Group by Test order by RowNo;

DROP Table Temp;

O/P:

MayilVahanan_0-1622712970598.png

 

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
sahilkain1
Contributor II
Contributor II
Author

Hi Mayil,

Thanks for your reply...
The code didn't worked as expected...if possible can we have a working session.

Thanks & Regards,

Sahil