issue with unicode character in JSON with tFileInp... - Qlik Community

soowork · ‎2016-12-19

I am using Talend BigData 5.4.1 (5.4.1.r111943). Similar to https://community.talend.com/t5/Design-and-Development/Invalid-XML-character-in-json-file/m-p/66674, I am encountering an error when trying to consume a JSON file that has a unicode control character in it. In my case it failed with "An invalid XML character (Unicode: 0x1b) was found in the element content of the document." in tFileInputJSON.

Is there yet any fix or work around for this issue?

Like GuruGulabKhatri, I also tried to strip the unicode character out in a tMap and had no luck (e.g. row1.line.replaceAll("\\u001b", "")).

If there is no fix or work around, it is known exactly which Unicode characters will cause tFileInputJSON to fail?

Thanks in advance.

Anonymous · ‎2016-12-20

Hi soowork

>I also tried to strip the unicode character out in a tMap and had no luck (e.g. row1.line.replaceAll("\\u001b", "")).
0x1b is not necessarly \u001b (for example is \u00b7 = 0xc2b7), you Need to find an Translation table like here: https://en.wikipedia.org/wiki/List_of_Unicode_characters

>If there is no fix or work around, it is known exactly which Unicode characters will cause tFileInputJSON to fail?
I think this might be the case when there is no equivalent of the Unicode character in your targetcodepage ( which i would expect to be ISO-8859-15) , but this is only a guess as i dont have talend bigdata .

in TIS 5.4 my json component hat in the "Advance Settings" section the posibility to Switch the Encoding, have you tried that ?

cheers
dj

soowork · ‎2016-12-20

Thanks dj.
In my case I checked the incoming file and see it written as "\u001BSam", which I interpreted as [ESC]Sam
That was why I tried to replace "\u001b".
But basically, even if I got the replace to work, that would only help if I did that for all possible breaking characters. Do you know if it is objecting to any unicode character? or just the fact that it is a control character?
I haven't tried changing the encoding - I will explore that. Though ideally i would like to strip or ignore such control characters, as opposed to allow them through...

issue with unicode character in JSON with tFileInputJSON_1

Talend Data Integration

v6.x