3 Replies Latest reply: Aug 31, 2017 6:44 PM by Petter Skjolden RSS

    XML Import, unvalid Unicode

    Thomas Freinbichler

      Hi,

      I hope anyone can help...I generate XML data (UTF-8) and import this data into Qlik Sense.

      In some Files there is the following script error...

       

      Der folgende Fehler ist aufgetreten:

      Ungültiges Unicode-Zeichen.

      On line number: 69. On column number: 22. System ID:


      This means, that there is an unvalid Unicode character.


      The XML header looks like this:

      <?xml version="1.0" encoding="utf-8"?>

      <ArrayOfNcProgram xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

       

      The problem is this:

      XML.png

      My load script looks like this:

      NCPROGS:

            LOAD

                Id AS NC.ProgramID,

                Name AS NC.ProgramName

            FROM [lib://Testdaten/$(machinenumber)/ncPrg_$(machinenumber)]

            (XmlSimple, table is [ArrayOfNcProgram/NcProgram]);

       

      Is there a chance to solve this topic within Qlik? The XML generator is a standard UTF-8 generator. It would be a lot of work around to correct the XML. I would prefer to solve the topic in the load script in Qlik. The strange text can stay as it is. I tried some things with codepage in the load statement, but it did not work.

      Thanks for help!

        • Re: XML Import, unvalid Unicode
          Petter Skjolden

          If you have an XML-file that contains characters that are "invalid code sequences" and/or "invalid codepoints" it is the producing applications responsibility to not produce these and should be fixed.

           

          UTF-8 - Wikipedia

           

          RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."


          However you might be able to employ some special software to try to fix this before you try to process it in Qlik Sense.


          The Unix/Linux command-line utility iconv is able to do exactly that. There are versions for Windows too of this program that you can download. https://dbaportal.eu/2012/10/24/iconv-for-windows/



          Example on how to use iconv to clean a utf-8 file:

          iconv -f utf-8 -t utf-8 -c file.txt

          will clean up your UTF-8 file, skipping all the invalid characters.

          -f is the source format -t the target format -c skips any invalid sequence

          snipped from here: linux - How to remove non UTF-8 characters from text file - Stack Overflow

            • Re: XML Import, unvalid Unicode
              Thomas Freinbichler

              Hallo Petter,

               

              thank you for your really quick answer. The information is good and helps.

              So in fact I now see these posibilities:

              1) create correct XML (at the moment we us the XMLSerializer in C#)

              2) regulary correct the XML with iconv (as a service)

              3) try to find a solution within Qlik (which is not the correct way)

               

              I will work on 1 and 2.

              But isn´t there also a posibility only to load correct file entrys and skip or correct invalid content within Qlik?

              Sorry for this last attempt...