Error processing resource while parsing XML with '&' symbol
Hi I want to parse xml through Talend job. In my xml contains special characters like"&" as "&" and "<" as "<" ">" as ">" "" etc How to replace this special characters while parsing xml file through Talend? sample xml <?xml version="1.0"?> <Extract> <Record> <ID>1</ID> <NAME>Product 1</NAME> <ATTS> <ATT>Me & my attribute</ATT> <ATT>Another attribute</ATT> </ATTS> </Record> <Record> <ID>2</ID> <NAME>Product 2</NAME> <ATTS> <ATT>Foo attribute</ATT> <ATT>Bar <br />attribute</ATT> </ATTS> </Record> <Record> <ID>3</ID> <NAME>Product 3</NAME> <ATTS> <ATT>John Doe attribute</ATT> <ATT>Foo & bar</ATT> </ATTS> </Record> </Extract> Please help me. Thanks Chin
Still I am not able to do with my xmls.
See below :
My requirement : I have a zip folder around 30.
I did the following :
Step 1: Unzip the folder using with and put it in a temporary folder
tFileList_1 --> tSystem
"unzip "+((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) +" -d " + context.tempdirectory
I open my xml file, it seems below format: so that it is throwing error.
Exception in component tFileInputXML_1
org.dom4j.DocumentException: Error on line 15 of document : The entity name must immediately follow the '&' in the entity reference. Nested exception: The entity name must immediately follow the '&' in the entity reference.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc >
<EMPS>
<EMP>
<STAFF>
<EMPCODE>111</EMPCODE>
<EMPDESIG>BA</EMPDESIG>
<DEPT>FIN</DEPT>
</STAFF>
<PERMANENT>
<ADDRESS>
<ADDRCODE>XX</ADDRCODE>
<ADDRCODE>ABCDE</ADDRCODE>
</ADDRESS>
</PERMANENT>
<FEEDBACK>
The Definitive Guide we offer a step by step guide on
how to install MongoDB and get it up and running smoothly.
Precompiled binaries are available for Linux, Mac OS X, Windows,
and Solaris. On most platforms you can download the archive from mongodb.org,
inflate it, and run the binary. "there is ink" in Fig. 3 The MongoDB server requires a directory it can write
database files to and a port it can listen for connections on.
The following section covers the entire install on the two variants of system:
Windows and everything else (Linux, Max, Solaris). 200 is A2', A1 > A2 > A3 - A7 is
Precompiled binaries are available for Linux, Mac OS X, Windows,
and Solaris. On most platforms you can download the archive from mongodb.org,
inflate it, and run the binary.
</FEEDBACK>
</EMP>
</EMPS>
I am using Talend 4.0.3 r47759 and from the attached screenshot 4.png, tRunJob_1 throwing error
Exception in component tFileInputXML_1
org.dom4j.DocumentException: Error on line 15 of document : The entity name must immediately follow the '&' in the entity reference. Nested exception: The entity name must immediately follow the '&' in the entity reference.
Please find attached screenshot of my jobs.
My XML :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc >
<EMPS>
<EMP>
<STAFF>
<EMPCODE>111</EMPCODE>
<EMPDESIG>BA</EMPDESIG>
<DEPT>FIN</DEPT>
</STAFF>
<PERMANENT>
<ADDRESS>
<ADDRCODE>XX</ADDRCODE>
<ADDRCODE>ABCDE</ADDRCODE>
</ADDRESS>
</PERMANENT>
<FEEDBACK>
The Definitive Guide we offer a step by step guide on
how to install MongoDB and get it up and running smoothly.
Precompiled binaries are available for Linux, Mac OS X, Windows,
and Solaris. On most platforms you can download the archive from mongodb.org,
inflate it, and run the binary. "there is ink" in Fig. 3 The MongoDB server requires a directory it can write
database files to and a port it can listen for connections on.
The following section covers the entire install on the two variants of system:
Windows and everything else (Linux, Max, Solaris). 200 is A2', A1 > A2 > A3 - A7 is
Precompiled binaries are available for Linux, Mac OS X, Windows,
and Solaris. On most platforms you can download the archive from mongodb.org,
inflate it, and run the binary.
</FEEDBACK>
</EMP>
</EMPS>
That tFileInputXML doesn't match the example xml you posted. It's looping on /simple-patent-document/bibliographic-data which doesn't appear in your example. We can't help if you don't post the correct data.