Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] invalid XML character (Unicode: 0x3) while reader json from rest

Hi,
This should be a classic problem but I can't resolve it for the moment.
I call a rest webservice page per page in a loop.
The answer of the webservice is JSON and I read it using a textractJSONFields component.
 On the 174th page I get the error in textractJSONFields I guess):
Error on line 2400 of document  : invalid XML character (Unicode: 0x3)  was found in the element content of the document.
I try to add a tJavaRow between trest and textractJSONFields to suppress this character :
String xml11pattern = "+";
output_row.BodyCorr = input_row.Body.replaceAll(xml11pattern, "");
output_row.ERROR_CODE = input_row.ERROR_CODE;

But I get still the same error.
Thanks for your help.
I use tos 5.6.1
Labels (6)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Many thanks for all your thoughts!
With you I find it out.
For the solution. The special character was not special character in the json string got from the WS but a string one.
So we need to use : output_row.string=output_row.string.replace("\\u0003", "");
and not replace all.
I found the solution using one of your thought. I export the line in a raw file to work it separately in a new job.
Using the line number was not possible because there is some huge string in this json flow so the line returned by the tExtractJSONFields was too difficult to find.
Many thanks again and have a good sunday!

View solution in original post

6 Replies
Anonymous
Not applicable
Author

I suspect we have an issue of XML 1.1 being processed by a parser requiring XML 1.0. XML 1.1 permites the unicode 0x3 character. XML 1.0 does not. The regex you were using is for XML 1.1. Try this one and see if it works for you...
""
Anonymous
Not applicable
Author

Thanks a lot but I try
String xml10pattern = "";
output_row.BodyCorr =input_row.Body.replaceAll(xml10pattern, "");

But with the same error.
Anonymous
Not applicable
Author

If the error was previously caused by the unicode: 0x3 character, this time it won't be. The previous pattern you used will have ignored the 0x3 char, the one I gave you will have removed it. Can you identify the character that is causing the error after the regex change? 
Another way of debugging this is to try to output the complete document as a String. You have been told that there is an error on line 2400 of document. From that, you should be able to identify which character(s) are causing the issue and test regex expressions to handle this.
Anonymous
Not applicable
Author

I've just had another thought. I think the error may be in another location. When the job falls over in the Studio, can you post the full error message? You will also notice line numbers associated with the error. Once you identify the line number, switch to the code tab (bottom left corner of your Studio designer window) and go to that line of code. You should be able to work out from that area of code which component is causing the issue and the actual line should give you a clue as to what is causing it. 
As an example, below is a runtime error I just manufactured in a job on my computer. Just a simple NullPointerException
Exception in component tMap_2
java.lang.NullPointerException
at mystuff.multiplyrowexample_0_1.MultiplyRowExample.tFixedFlowInput_1Process(MultiplyRowExample.java:752)
at mystuff.multiplyrowexample_0_1.MultiplyRowExample.runJobInTOS(MultiplyRowExample.java:1140)
at mystuff.multiplyrowexample_0_1.MultiplyRowExample.main(MultiplyRowExample.java:997)
Line 752 of the code shows me the precise point where the Null causes an issue.
If you can identify exactly where this is falling over, we can help a bit more. 
I suspect that the error has already occurred before you try and replace the characters. You say you are receiving JSON from the service. I believe that it may be being processed as XML. This could be because you are not specifying an "Accept" header. Have you tried using the tRestClient component? If not, try that and set the Accept type to JSON and connect your tExtractJSONFields to the String output.
Lots of suggestions I'm afraid, but it can be tricky to identify web service issues when you can't play with them yourself. Let me know how you get on.
Anonymous
Not applicable
Author

Many thanks for all your thoughts!
With you I find it out.
For the solution. The special character was not special character in the json string got from the WS but a string one.
So we need to use : output_row.string=output_row.string.replace("\\u0003", "");
and not replace all.
I found the solution using one of your thought. I export the line in a raw file to work it separately in a new job.
Using the line number was not possible because there is some huge string in this json flow so the line returned by the tExtractJSONFields was too difficult to find.
Many thanks again and have a good sunday!
Anonymous
Not applicable
Author

Thanks for letting us know how you resolved your issue 🙂