
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
XML special chars not converted
Hi all,
I have an Excel file like this
and a Talend Job (TOS 😎 that reads the file and get an output file (XML)
but the special chars double quote (") and single quote (') are not converted:
Anyone know how to fix?
Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the scenario you are demonstrating there, there is no need to convert the " or ' chars. That is perfectly acceptable XML. However, if you want to convert all Strings regardless of the necessity for it, you can use some code like this.....
System.out.println(TalendString.replaceSpecialCharForXML("A test String with ' and \", > and <"));
Dump the above in a tJava component to test it. The "System.out.println" just prints out to the output window. The important bit is the "TalendString.replaceSpecialCharForXML("A test String with ' and \", > and <")" section. This method will replace any special chars in your Strings.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply.
I am using some software requiring a conversion for " and ' (they are special chars for an XML file)
I knew the solution you are suggesting but it does not work well for my purpose, because as I expected the & will be converted twice.
Anyway, IMHO, I think this is a bug of the tAdvancedOutputXML component.
Thanks again but the problem still remain.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see your problem here, so I have spent some time looking into this to see how I could help. Unfortunately I don't think you will like what I have found. This is not a Talend issue I'm afraid. This is Java and the XML specification. Single quotes and double quotes are perfectly acceptable in XML element values, so they are not automatically converted. Talend is not doing this, this conversion is handled by the Java libraries being used and it appears to be a pretty consistent thing.
To prove this, I have built a quick demo that you can try out on your machine. It is simply a job with a tJava and a routine I have hacked together quickly. The tJava code is below.....
String text = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
System.out.println(text);
System.out.println(routines.XMLUtils.updateXML(text));
Essentially what I am doing here is creating a simple XML String as "text". I am then printing it to the Sys.out. Then I am calling the routine I will share next to edit the XML.
The routine.....
package routines;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XMLUtils {
public static String updateXML(String xml) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
DOMSource domSource = null;
StringWriter writer = null;
StreamResult result = null;
try {
docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
visit(document, 0);
domSource = new DOMSource(document);
writer = new StringWriter();
result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
} catch (ParserConfigurationException | SAXException | IOException | TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return writer.toString();
}
public static void visit(Node node, int level) {
System.out.println("Name:" + node.getNodeName());
System.out.println("Value:" + node.getNodeValue());
NodeList list = node.getChildNodes();
System.out.println("Number:" + list.getLength());
for (int i = 0; i < list.getLength(); i++) {
Node childNode = list.item(i);
visit(childNode, level + 1);
if (childNode.getNodeName().compareToIgnoreCase("#text") == 0) {
Node replacementNode = childNode.cloneNode(true);
replacementNode.setNodeValue("<>' \""); //<-- I'm changing all text values to be <>'" here
Node parentNode = childNode.getParentNode();
parentNode.removeChild(childNode);
parentNode.appendChild(replacementNode);
}
}
}
}
I started building this trying to provide a fix for you, but when I saw it working, I realised that the problem is in the Java XML libraries. You can just copy and paste the routine above into your Studio. Notice the section that says....
"//<-- I'm changing all text values to be <>'" here"
....this is where I was previously taking the original element value and converting it. This time I am simply setting every element to the same thing. I am not manually converting the <, >, ', or " here. I am just adding that string to each element.
When you run this job you will see the original XML printed out like this......
<?xml version="1.0" encoding="UTF-8"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>
Then you will see some debugging outputs, you can ignore those. But when you get to the end you will see this.....
<?xml version="1.0" encoding="UTF-8" standalone="no"?><note><to><>' "</to><from><>' "</from><heading><>' "</heading><body><>' "</body></note>
Notice that only the < and > are translated and the ' and " remain as they were originally. This shows that Java does not expect those values to cause problems.
Now, I understand that the product you are working with can't work with this. Given what you have seen here, it should be assumed that the issue is with the product you are working with. However, we can still potentially mitigate for this.....but it won't be easy.
My suggestion is to build your XML using a tXMLMap and then convert it to a String. Once in String format, you can use String manipulation to find double and single quotes that need altering. Alter those values (be careful not to alter quotes in the XML header and in attributes, etc), then write the converted String to a tFileOutputRaw. This will result in a file that your other application will be able to read.
I know this is a pain, but it is possible to do this. You may need to use regular expressions to make this as safe as possible.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rhall, maybe you are right: the conversion is correct. Double quote is converted when it is an attribute and single quote not, but the external tool accept this xml as a valid input. Sorry if you wasted your time 😞

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not a problem at all. I learnt something from looking into this, so my time was not wasted at all 🙂
