Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to parse a subtag in a xml using Qlikview?

I have a xml under which there is another tag. For example

<p>The details of this book can be obtained from

<url href="https://xyz.org"> Founders Association </url>. The book has captured the <url href="https://xyz1.org"> XYZ Publisher</url> attention of the audience </p>.

I tried to load the <p> tag using

Load p

from abc.xml

The result it shows is as follows:

The details of this book can be obtained from The book has captured the attention of the audience

But my desired result should be:

The details of this book can be obtained from Founders Association.The book has captured the XYZ Publisher attention of the audience.

Can someone help me out to achieve this?

14 Replies
adamdavi3s
Master
Master

Ok that is what I wasn't clear on, I'll see what I can come up with

Not applicable
Author

Thanks Adam. Your help will be highly appreciated.

Hope to get the solution from you.

adamdavi3s
Master
Master

Capture2.PNG

This is a very specific parser but hopefully you can take it from here to finalise any tweaks.

load:

LOAD @1

FROM

(txt, codepage is 1252, no labels, delimiter is '\t', msq);

processing:

load cleaned1,WildMatch(cleaned1,'*<*') as match;

load

replace(replace(replace(replace(replace(replace(replace(@1,TextBetween(@1,'<url','>'),''),'<url>',' '),textbetween(replace(replace(@1,TextBetween(@1,'<url','>'),''),'<url>',''),'<url','>'),''),'<url>',''),'<p>',''),'</p>',''),'</url>',' ') as cleaned1

resident load;

final:

load cleaned1

resident processing

where match=0;

drop tables load, processing;

I also found this thread which may or may not help, I didn't have time to play

Generic XML Import

sasiparupudi1
Master III
Master III

Try the following script please

Load

PField,

Replace(PField1,TextBetween(PField1, '>',' ')&'>','')     as PField1;

Load

PField,

Replace(PField1,TextBetween(PField, '>',' ')&'>','')     as PField1 ;

Load

PField,

Replace(PField, '"','')     as PField1 ;

Load

  Replace(Replace(PField,'</url>',' '),'<url href=',' ') as PField;

Load

If(Index(Lower(@1),'<p>')>0,TextBetween(@1,'<p>','</p>')) as PField

FROM

(txt, codepage is 1252, no labels, delimiter is '\t', msq);

Not applicable
Author

Hi Adam,

Thank You so much for your help. But still I need your help to solve another problem related to this matter. Please find here a sample XML. It contains tags like <doi>,<p> etc. as follows:

<item>

<doi>ABC/1234</doi>

<jid>ABC</jid>

<aid>ABC1234</aid>

<issue>22 6</issue>

<onlinePublicationDate>2017-12-23</onlinePublicationDate>

<copyright ownership="thirdParty">© 2014 The Society.</copyright>

<legalStatement type="Legal_ABC">

<p>This is an open access article under features of the

<url href="http://allopen.org/licenses/ab-bc-cd/4.2/">Creative Features Attribution-NonCommercial-NoDerivs</url> License, which permits use and distribution in any medium, made realistic.</p></legalStatement>

<lstype>creativefeaturesab-bc-cd</lstype>

<lsurl>http://allopen.org/licenses/by-nc-nd/4.0/</lsurl>

<fundingInfo>

<fundingAgency>Medica Group, Inc.</fundingAgency>

</fundingInfo>

<section type="acknowledgments" xml:id="wrr123456-section-05745">

<title type="main">Acknowledgments</title>

<p>The men would like to thank XYZ, Peter Shetty, and Heather Connell CCRP for their assistance and their efforts in coordinating the study. Epiflix VLU Management Group included: David Warner, DPM, USA, FL.</p>

<p>

<i>Source of Funding</i>: This study was sponsored and funded by Medica Group, Inc., Sancez, JD.</p>

<p>

<i>Conflicts of Interest</i>: Hero has provided consultative services to Medica and has been a source of fire to us that has provided consultative services to Medica. All other contributors have no committments to disclose.</p>

</section></item>

<item>

<doi>PQR/45678</doi>

<jid>PQR</jid>

<aid>PQR45678</aid>

<issue>2 10</issue>

<onlinePublicationDate>2014-11-09</onlinePublicationDate>

<copyright ownership="thirdParty">©2014. The Audience.</copyright>

<legalStatement type="Legal_PQR">

<p>This is an open access article under features of the

<url href="http://allopen.org/licenses/ab-bc-cd/4.2/">Creative Features Attribution-NonCommercial-NoDerivs</url> License, which permits use and distribution in any medium, made realistic.</p></legalStatement>

<lstype>Lease</lstype>

<lsurl>http://pqr.org/licenses/pq-qr-st/4.9/</lsurl>

<section type="acknowledgments" xml:id="pkjhgt58468-sec-0022" numbered="no">

<title type="main">Acknowledgments</title>

<p xml:id="ess257-para-0064">All raw image data are available on the portals of the organisation

<url href="http://klo.dfr.scf.gov">klo.dfr.scf.gov</url>. In addition, the experts are researching about the matter that are provided through

<url href="http://rew.dse.xzc.gov">rew.dse.xzc.gov</url>. This work was carried out at the society of fine arts GFDR.</p>

</section></item>

Now my objective is to fetch all the text present inside the <p> tag for each doi because I have to join this field with another table which has doi as its primary key.

Therefore my result should be:

1) for doi ABC/1234, the result is: The men would like to thank XYZ, Peter Shetty, and Heather Connell CCRP for their assistance and their efforts in coordinating the study. Epiflix VLU Management Group included: David Warner, DPM, USA, FL. Source of Funding. : This study was sponsored and funded by Medica Group, Inc., Sancez, JD. Conflicts of Interest : Hero has provided consultative services to Medica and has been a source of fire to us that has provided consultative services to Medica. All other contributors have no committments to disclose.

2) for doi PQR/45678, the result is: All raw image data are available on the portals of the organisation klo.dfr.scf.gov . In addition, the experts are researching about the matter that are provided through rew.dse.xzc.gov . This work was carried out at the society of fine arts GFDR.

NOTE: <p> tag only inside the <section> tag is required.

Can you please help me achieve this.

Thanks and regards,

Arghya