Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
olja
Contributor
Contributor

Is it possible to detect if a file is in expected encoding (utf8)?

I have a java library which consumes incoming files. The problem is, that it is only working if the file is utf8 encoded. Is there a component or a best practice to check this with talend? I want to reject files which are not utf8 encoded.

Thanks

Labels (3)
3 Replies
Anonymous
Not applicable

Hello,

So far, there is no a component or a built-in function can be used to detect the file encoding. You could write a routine in Talend to parse the file encoding.

https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-s...

Best regards

Sabrina

olja
Contributor
Contributor
Author

Thanks for the reply,

 

Maybe someone has the same problem, i have solved it with an external jar

 

 

With that  org.mozilla.universalchardet.UniversalDetector

 

https://github.com/albfernandez/juniversalchardet

 

it worked quite good. I have added a Java Routine and using it my job, if it is not return true, then I will throw an exception and the file will be handled differently

Anonymous
Not applicable

Hello,

Great it works. Thanks for sharing it with us on community.

Best regards

Sabrina