Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Detect and Reject Non UTF-8 files

I have a task of detecting and rejecting all incoming xml files of Non UTF-8 format.
If my XML input file is of the form:
<?xml version="1.0" encoding="EBCDIC"?>
<book>
<price>50£</price>
</book>
and the advanced settings within tFileInputXML and tFileOutputXML has UTF-8 selected, the job runs successfully whereas I want to the file to be rejected.
Output file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<price>50</price>
</row>
</root>
The file needs to be rejected even in below scenario wherein the xml version encoding is defined as UTF-8 but the data contains non UTF-8 characters(the pound symbol in the below example)
Input file:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<price>50£</price>
</book>
Labels (3)
1 Reply
Anonymous
Not applicable
Author

Hi
There is no a component or a built-in function can be used to detect the file encoding, you can refer to these discussions in this page and write a routine in Talend to parse the file encoding.
Shong