Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

XML Import, unvalid Unicode

Hi,

I hope anyone can help...I generate XML data (UTF-8) and import this data into Qlik Sense.

In some Files there is the following script error...

Der folgende Fehler ist aufgetreten:

Ungültiges Unicode-Zeichen.

On line number: 69. On column number: 22. System ID:


This means, that there is an unvalid Unicode character.


The XML header looks like this:

<?xml version="1.0" encoding="utf-8"?>

<ArrayOfNcProgram xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

The problem is this:

XML.png

My load script looks like this:

NCPROGS:

      LOAD

          Id AS NC.ProgramID,

          Name AS NC.ProgramName

      FROM [lib://Testdaten/$(machinenumber)/ncPrg_$(machinenumber)]

      (XmlSimple, table is [ArrayOfNcProgram/NcProgram]);

Is there a chance to solve this topic within Qlik? The XML generator is a standard UTF-8 generator. It would be a lot of work around to correct the XML. I would prefer to solve the topic in the load script in Qlik. The strange text can stay as it is. I tried some things with codepage in the load statement, but it did not work.

Thanks for help!

1 Solution

Accepted Solutions
petter
Partner - Champion III
Partner - Champion III

If you have an XML-file that contains characters that are "invalid code sequences" and/or "invalid codepoints" it is the producing applications responsibility to not produce these and should be fixed.

UTF-8 - Wikipedia

RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."


However you might be able to employ some special software to try to fix this before you try to process it in Qlik Sense.


The Unix/Linux command-line utility iconv is able to do exactly that. There are versions for Windows too of this program that you can download. https://dbaportal.eu/2012/10/24/iconv-for-windows/



Example on how to use iconv to clean a utf-8 file:

iconv -f utf-8 -t utf-8 -c file.txt

will clean up your UTF-8 file, skipping all the invalid characters.

-f is the source format -t the target format -c skips any invalid sequence

snipped from here: linux - How to remove non UTF-8 characters from text file - Stack Overflow

View solution in original post

3 Replies
petter
Partner - Champion III
Partner - Champion III

If you have an XML-file that contains characters that are "invalid code sequences" and/or "invalid codepoints" it is the producing applications responsibility to not produce these and should be fixed.

UTF-8 - Wikipedia

RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."


However you might be able to employ some special software to try to fix this before you try to process it in Qlik Sense.


The Unix/Linux command-line utility iconv is able to do exactly that. There are versions for Windows too of this program that you can download. https://dbaportal.eu/2012/10/24/iconv-for-windows/



Example on how to use iconv to clean a utf-8 file:

iconv -f utf-8 -t utf-8 -c file.txt

will clean up your UTF-8 file, skipping all the invalid characters.

-f is the source format -t the target format -c skips any invalid sequence

snipped from here: linux - How to remove non UTF-8 characters from text file - Stack Overflow

Anonymous
Not applicable
Author

Hallo Petter,

thank you for your really quick answer. The information is good and helps.

So in fact I now see these posibilities:

1) create correct XML (at the moment we us the XMLSerializer in C#)

2) regulary correct the XML with iconv (as a service)

3) try to find a solution within Qlik (which is not the correct way)

I will work on 1 and 2.

But isn´t there also a posibility only to load correct file entrys and skip or correct invalid content within Qlik?

Sorry for this last attempt...

petter
Partner - Champion III
Partner - Champion III

Not that I know of except loading it not as XML initially but as text and do some fixing and lastly use a LOAD ... FROM_FIELD ....;   statement with XML as format.

This is rather "complicated" so I wouldn't advice you to do that.