Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to detect encoding of a text file

Hello,
Is there a way to detect the encoding of a text file automatically?
I need to read various text files, but sometimes encoding changes without announcements.
This may go unnoticed and thus corrupted data may be stored in database, which I want to avoid.
If it is not possible to detect the encoding, is there any idea to notify that the encoding has been changed?
Regards,
Aya
Labels (2)
6 Replies
Anonymous
Not applicable
Author

Hello,
I think there is no direct way how to do it directly in Talend. My idea in this case that you have many files to process into database and don't know it's encoding and even when you know it can change immediatelly, you could try write Talend routine where you will use one of following libraries:
http://sourceforge.net/projects/jchardet/
http://code.google.com/p/juniversalchardet/
You will have for example default UTF-8 or you will always save last processed file encoding and match it against the newest one.
Let me know, please, how did you fix this requirement in your project.
Best regards,
Ladislav
Anonymous
Not applicable
Author

Hello,
Thank you for your reply.
I was hoping that Talend might have the feature, but I will follow your advice and try using juniversalchardet.
This requirement was already fixed before deciding to use Talend, and since files are sent from various customers, it is difficult to change the requirement.
Thank you again for your help.
Regards,
Aya
_AnonymousUser
Specialist III
Specialist III

I you have time this is not as difficutl as it seems to be to write your own component. I will have maybe time over the weekend so I will also take a look at this.
My proposed behavior is on following, the component will have one input parameter <path to file> and one output parametr of type string and this output parameter will keep the detected file encoding.
Let me know if you have other idea of how this schould work.
Best regards,
Ladislav
Anonymous
Not applicable
Author

I you have time this is not as difficutl as it seems to be to write your own component. I will have maybe time over the weekend so I will also take a look at this.
My proposed behavior is on following, the component will have one input parameter <path to file> and one output parametr of type string and this output parameter will keep the detected file encoding.
Let me know if you have other idea of how this schould work.
Best regards,
Ladislav
Anonymous
Not applicable
Author

Any update on this? I would love to have such an component. I couldn't manage implementing the juniversalchardet in Talend.
JWagler
Contributor
Contributor

I ran into a similar issue where someone changed the encoding on us and it silently broke the system.  I would love a compenent that reads a file and detects the encoding.