Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a xml under which there is another tag. For example
<p>The details of this book can be obtained from
<url href="https://xyz.org"> Founders Association </url>. The book has captured the <url href="https://xyz1.org"> XYZ Publisher</url> attention of the audience </p>.
I tried to load the <p> tag using
Load p
from abc.xml
The result it shows is as follows:
The details of this book can be obtained from The book has captured the attention of the audience
But my desired result should be:
The details of this book can be obtained from Founders Association.The book has captured the XYZ Publisher attention of the audience.
Can someone help me out to achieve this?
Ok that is what I wasn't clear on, I'll see what I can come up with
Thanks Adam. Your help will be highly appreciated.
Hope to get the solution from you.
This is a very specific parser but hopefully you can take it from here to finalise any tweaks.
load:
LOAD @1
FROM
(txt, codepage is 1252, no labels, delimiter is '\t', msq);
processing:
load cleaned1,WildMatch(cleaned1,'*<*') as match;
load
replace(replace(replace(replace(replace(replace(replace(@1,TextBetween(@1,'<url','>'),''),'<url>',' '),textbetween(replace(replace(@1,TextBetween(@1,'<url','>'),''),'<url>',''),'<url','>'),''),'<url>',''),'<p>',''),'</p>',''),'</url>',' ') as cleaned1
resident load;
final:
load cleaned1
resident processing
where match=0;
drop tables load, processing;
I also found this thread which may or may not help, I didn't have time to play
Try the following script please
Load
PField,
Replace(PField1,TextBetween(PField1, '>',' ')&'>','') as PField1;
Load
PField,
Replace(PField1,TextBetween(PField, '>',' ')&'>','') as PField1 ;
Load
PField,
Replace(PField, '"','') as PField1 ;
Load
Replace(Replace(PField,'</url>',' '),'<url href=',' ') as PField;
Load
If(Index(Lower(@1),'<p>')>0,TextBetween(@1,'<p>','</p>')) as PField
FROM
(txt, codepage is 1252, no labels, delimiter is '\t', msq);
Hi Adam,
Thank You so much for your help. But still I need your help to solve another problem related to this matter. Please find here a sample XML. It contains tags like <doi>,<p> etc. as follows:
<item>
<doi>ABC/1234</doi>
<jid>ABC</jid>
<aid>ABC1234</aid>
<issue>22 6</issue>
<onlinePublicationDate>2017-12-23</onlinePublicationDate>
<copyright ownership="thirdParty">© 2014 The Society.</copyright>
<legalStatement type="Legal_ABC">
<p>This is an open access article under features of the
<url href="http://allopen.org/licenses/ab-bc-cd/4.2/">Creative Features Attribution-NonCommercial-NoDerivs</url> License, which permits use and distribution in any medium, made realistic.</p></legalStatement>
<lstype>creativefeaturesab-bc-cd</lstype>
<lsurl>http://allopen.org/licenses/by-nc-nd/4.0/</lsurl>
<fundingInfo>
<fundingAgency>Medica Group, Inc.</fundingAgency>
</fundingInfo>
<section type="acknowledgments" xml:id="wrr123456-section-05745">
<title type="main">Acknowledgments</title>
<p>The men would like to thank XYZ, Peter Shetty, and Heather Connell CCRP for their assistance and their efforts in coordinating the study. Epiflix VLU Management Group included: David Warner, DPM, USA, FL.</p>
<p>
<i>Source of Funding</i>: This study was sponsored and funded by Medica Group, Inc., Sancez, JD.</p>
<p>
<i>Conflicts of Interest</i>: Hero has provided consultative services to Medica and has been a source of fire to us that has provided consultative services to Medica. All other contributors have no committments to disclose.</p>
</section></item>
<item>
<doi>PQR/45678</doi>
<jid>PQR</jid>
<aid>PQR45678</aid>
<issue>2 10</issue>
<onlinePublicationDate>2014-11-09</onlinePublicationDate>
<copyright ownership="thirdParty">©2014. The Audience.</copyright>
<legalStatement type="Legal_PQR">
<p>This is an open access article under features of the
<url href="http://allopen.org/licenses/ab-bc-cd/4.2/">Creative Features Attribution-NonCommercial-NoDerivs</url> License, which permits use and distribution in any medium, made realistic.</p></legalStatement>
<lstype>Lease</lstype>
<lsurl>http://pqr.org/licenses/pq-qr-st/4.9/</lsurl>
<section type="acknowledgments" xml:id="pkjhgt58468-sec-0022" numbered="no">
<title type="main">Acknowledgments</title>
<p xml:id="ess257-para-0064">All raw image data are available on the portals of the organisation
<url href="http://klo.dfr.scf.gov">klo.dfr.scf.gov</url>. In addition, the experts are researching about the matter that are provided through
<url href="http://rew.dse.xzc.gov">rew.dse.xzc.gov</url>. This work was carried out at the society of fine arts GFDR.</p>
</section></item>
Now my objective is to fetch all the text present inside the <p> tag for each doi because I have to join this field with another table which has doi as its primary key.
Therefore my result should be:
1) for doi ABC/1234, the result is: The men would like to thank XYZ, Peter Shetty, and Heather Connell CCRP for their assistance and their efforts in coordinating the study. Epiflix VLU Management Group included: David Warner, DPM, USA, FL. Source of Funding. : This study was sponsored and funded by Medica Group, Inc., Sancez, JD. Conflicts of Interest : Hero has provided consultative services to Medica and has been a source of fire to us that has provided consultative services to Medica. All other contributors have no committments to disclose.
2) for doi PQR/45678, the result is: All raw image data are available on the portals of the organisation klo.dfr.scf.gov . In addition, the experts are researching about the matter that are provided through rew.dse.xzc.gov . This work was carried out at the society of fine arts GFDR.
NOTE: <p> tag only inside the <section> tag is required.
Can you please help me achieve this.
Thanks and regards,
Arghya