Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Hi, 

 

I'm using the Talend Open Studio for Data Quality Version 6.5.1 to analyze the quality of data in a csv file which is encoded in UTF-8. If I select the indicator 'Soundex Frequency' for a column which values contains special characters like "ü" and "é" and run the analysis I get the following error message: 

 

 

2018-05-04 17:14:20,232 ERROR org.talend.dq.analysis.AnalysisExecutor  - java.lang.IllegalArgumentException: The character is not mapped: Ü
java.lang.IllegalArgumentException: The character is not mapped: Ü
	at org.apache.commons.codec.language.Soundex.map(Soundex.java:226)
	at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:180)
	at org.apache.commons.codec.language.Soundex.soundex(Soundex.java:264)
	at org.talend.dataquality.indicators.impl.SoundexFreqIndicatorImpl.handle(SoundexFreqIndicatorImpl.java:283)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.handleByARow(DelimitedFileIndicatorEvaluator.java:335)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.useCsvReader(DelimitedFileIndicatorEvaluator.java:257)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.executeSqlQuery(DelimitedFileIndicatorEvaluator.java:115)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:146)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:207)
	at org.talend.dq.analysis.DelimitedFileAnalysisExecutor.runAnalysis(DelimitedFileAnalysisExecutor.java:70)
	at org.talend.dq.analysis.AnalysisExecutor.execute(AnalysisExecutor.java:146)
	at org.talend.dq.analysis.AnalysisExecutorSelector.executeAnalysis(AnalysisExecutorSelector.java:171)
	at org.talend.dataprofiler.core.ui.action.actions.RunAnalysisAction$1.runInWorkspace(RunAnalysisAction.java:222)
	at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:38)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)

 

I've already tried to solve the problem by the solution of this post: https://community.talend.com/t5/Design-and-Development/Handling-special-characters/m-p/25169#M4268

and I checked "Allow specific characters (UTF8,...) for columns of schemas" under Window / Preferences / Talend / Specific Settings.

Neither of the solutions worked for me. 

 

Is there any workaround to solve the problem?

 

Thanks in advance

Frank

 

Labels (3)
2 Replies
Anonymous
Not applicable
Author

Hello,

Have you tried to add -Dfile.encoding=utf8 in the ini (config file) and restart your studio to see if it works?

Best regards

Sabrina

msjian
Employee
Employee

hi
we don't support that the indicator 'Soundex Frequency' to run
for a column which values contains special characters like "ü" and "é" and Chinese/Japanese characters.
get this error is normal, we will not fix this