Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

An invalid XML character (Unicode: 0xc) was found in the element conte

Hello folks,
I've got a problem with Zendesk integration.
After I send a request I receive a JSON document, while in general my integration is working lika a charm, sometimes I receive an error: 
Error on line 21531 of document  : An invalid XML character (Unicode: 0xc) was found in the element content of the document. Nested exception: An invalid XML character (Unicode: 0xc) was found in the element content of the document.

. Of course result file cannot be parsed by tFileJsonInput and everything is going straing into Pluto.
I've read that 0xc is XML illegal control character. Which shouldn't be send in file anyway. However because in file content I receive content of Zendesk (Customer support system) tickets I cannot quarantee what is inside 😕
To get rid of this illegal char I've created a routine with code from Marc McLaren blog:
Then I use this routine in my flow (accordingly to screenshot 1)
(screen shot 1)
because nothing happened and file was still malformed (according to tFileInputJson, or metadata->Json->input wizzard), I made small modification and add 
if((current == 0xc))
        System.out.println("i: "+i+": current: "+current );



line, so now I have:
if (
            (current == 0x9) ||
               (current == 0xA) ||
               (current == 0xD) ||
               ((current >= 0x20) && (current <= 0xD7FF)) ||
               ((current >= 0xE000) && (current <= 0xFFFD)) ||
               ((current >= 0x10000) && (current <= 0x10FFFF))
               )
           
               out.append(current);
           else{
            System.out.println("i: "+i+": current: "+current );
           }
           if((current == 0xc))
        System.out.println("i: "+i+": current: "+current );

Sadly, I have no result on any of new system outputs.. and new file is 'malformed' just like original one.
Of course if I replace 0xc to 0x29 just to check if this code is working I got multiple hits so it's fine.
Doo you have any idea what could I do to make it works?
of course because customer support data are extremely sensitive and confidential I cannot send any file as example 😕
If I copy this file into 
I got validation positive result, and malformed line looks like this:
(screen shot 1)
If I open the same file fragment in Notepad++ I also cannot find any special character:
(ss3)

Do you have any idea how to fix this error?
Kind Regards,
Michal
0683p000009MBMz.png 0683p000009MBS3.png 0683p000009MBNn.png
Labels (4)
1 Reply
willm1
Creator
Creator

+ Are using encoding to read your file?
+ If you are and need to replace known 'bad' patterns of characters, how about leveraging something like sed utility to do general find/replace in one fell swoop.
+ if this is XML, what about validation prior to processing - and maybe rejecting the file or records?