Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
sahilkain1
Contributor II
Contributor II

Need to extract the text between > and < from the HTML Code

Hi All,

I've a column from coming from DB, which is having data in the form of HTML Code. I need to extract the plain data from the HTML code. One possible approach i found is get the data between > and < brackets using Textbetween fuction, but the do whille loop shown in below link is not working for my code.

https://community.qlik.com/t5/App-Development/Extract-Plain-Text-from-HTML/td-p/1670967

Table 1:

Load ID,
            Text
From Table XYZ;

My Text in below format:

<html><head></head><body><div><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>OK to Pay <?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Indemnity Loss <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Payee: <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Invoice: 5710 fees for <o:p></o:p></span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Amount: $ 2689.82</span></p><p class=MsoNormal style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal"><span style='FONT-SIZE: 10pt; FONT-FAMILY: "Cambria",serif'>Code: AT <a onclick="OpenNewBrowser( ' ../../Desktop/FileNotes/FNViewAttachment.aspx?UIC=M%3d1%26FileNoteID%3dC1A39398DB6CB7B4%26A%3d13%26ClaimID%3d5AA130AEE2C40E72%26AttachmentID%3dFADE85B765ECD459 ', '', 'width=990,height=641,status,statusbar,scrollbars,resizable,menubar,top=0,left=0');" title="Legal Cor" href="javascript&colon;void(0)"><font color=#0066cc>WCS20200327_09190385.PDF</font></a><o:p></o:p></span></p></div></body></html>

 

Output Expected:
OK to Pay Indemnity Loss Payee Invoice: 5710 fees for Amount: $ 2689.82 Code: AT WCS20200327_09190385.PDF

 

Thanks in Advance

2 Replies
MayilVahanan

HI @sahilkain1 

Try like below

Temp:

Load *, TextBetween(SubField(SubField(Test, 'serif'), 'font'), '>','<') as T1, RowNo() as RowNo;
LOAD Test
FROM
[D:\Qlik\Com\test.xlsx]
(ooxml, embedded labels, table is Sheet1);

Load Test, Concat(T1,' ', RowNo) as T2 Resident Temp Group by Test order by RowNo;

DROP Table Temp;

O/P:

MayilVahanan_0-1622712970598.png

 

Thanks & Regards, Mayil Vahanan R
Please close the thread by marking correct answer & give likes if you like the post.
sahilkain1
Contributor II
Contributor II
Author

Hi Mayil,

Thanks for your reply...
The code didn't worked as expected...if possible can we have a working session.

Thanks & Regards,

Sahil