Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Detect and Reject Non UTF-8 files

I have a task of detecting and rejecting all incoming xml files of Non UTF-8 format.
If my XML input file is of the form:
<?xml version="1.0" encoding="EBCDIC"?>
<book>
<price>50£</price>
</book>
and the advanced settings within tFileInputXML and tFileOutputXML has UTF-8 selected, the job runs successfully whereas I want to the file to be rejected.
Output file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<price>50</price>
</row>
</root>
The file needs to be rejected even in below scenario wherein the xml version encoding is defined as UTF-8 but the data contains non UTF-8 characters(the pound symbol in the below example)
Input file:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<price>50£</price>
</book>
Labels (3)
1 Reply
Anonymous
Not applicable
Author

Hi
There is no a component or a built-in function can be used to detect the file encoding, you can refer to these discussions in this page and write a routine in Talend to parse the file encoding.
Shong