Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
joboro1
Contributor
Contributor

tDTDValidator error with valid UTF-8 files and special characters

Hi there,

we are using Talend Studio 6.4.1 and trying to process XML files which we want to validate against a DTD file beforehand. But we are facing problems with XML files containing special characters like the euro sign (€) or sz (ß). The tDTDValidator component runs into an error:

 

[FATAL]: abc.ordersvalidate_0_1.OrdersValidate - tDTDValidator_1 Invalid byte 2 of 2-byte UTF-8 sequence.

The error pattern looks like described in this thread ([resolved] Error with tDTDValidator) at talendforge.

It seems that the tDTDValidator uses the encoding "ISO-8859-1" for the XML file regardless of the used encoding inside the XML file.

                String encoding = null;
                if (doctDTDValidator_1.getXmlEncoding() == null) {
                    encoding = "ISO-8859-1";
                } else {
                    encoding = doctDTDValidator_1.getXmlEncoding();
                }

The workaround suggested in the named thread is to use the tXSDValidator which we cannot use.

Is this a know bug of the tDTDValidator component and has been fixed in a later version or what kind of workarounds are around there?

 

KR

 

joboro

 

 

 

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

     The present version of tDTDValidator do not have the option to use UTF-8 and its configured for ISO-8859-1 character set.

 

      If you are an enterprise customer, could you please create a support case to see the possibility to get a quick patch to make it configurable.

 

      If you are using an open source version, could you please create a feature request using below link?

 

https://jira.talendforge.org

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

View solution in original post

2 Replies
Anonymous
Not applicable

Hi,

 

     The present version of tDTDValidator do not have the option to use UTF-8 and its configured for ISO-8859-1 character set.

 

      If you are an enterprise customer, could you please create a support case to see the possibility to get a quick patch to make it configurable.

 

      If you are using an open source version, could you please create a feature request using below link?

 

https://jira.talendforge.org

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

joboro1
Contributor
Contributor
Author

I opened a support case.
KR