Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm using the Talend Open Studio for Data Quality Version 6.5.1 to analyze the quality of data in a csv file which is encoded in UTF-8. If I select the indicator 'Soundex Frequency' for a column which values contains special characters like "ü" and "é" and run the analysis I get the following error message:
2018-05-04 17:14:20,232 ERROR org.talend.dq.analysis.AnalysisExecutor - java.lang.IllegalArgumentException: The character is not mapped: Ü java.lang.IllegalArgumentException: The character is not mapped: Ü at org.apache.commons.codec.language.Soundex.map(Soundex.java:226) at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:180) at org.apache.commons.codec.language.Soundex.soundex(Soundex.java:264) at org.talend.dataquality.indicators.impl.SoundexFreqIndicatorImpl.handle(SoundexFreqIndicatorImpl.java:283) at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.handleByARow(DelimitedFileIndicatorEvaluator.java:335) at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.useCsvReader(DelimitedFileIndicatorEvaluator.java:257) at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.executeSqlQuery(DelimitedFileIndicatorEvaluator.java:115) at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:146) at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:207) at org.talend.dq.analysis.DelimitedFileAnalysisExecutor.runAnalysis(DelimitedFileAnalysisExecutor.java:70) at org.talend.dq.analysis.AnalysisExecutor.execute(AnalysisExecutor.java:146) at org.talend.dq.analysis.AnalysisExecutorSelector.executeAnalysis(AnalysisExecutorSelector.java:171) at org.talend.dataprofiler.core.ui.action.actions.RunAnalysisAction$1.runInWorkspace(RunAnalysisAction.java:222) at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:38) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
I've already tried to solve the problem by the solution of this post: https://community.talend.com/t5/Design-and-Development/Handling-special-characters/m-p/25169#M4268
and I checked "Allow specific characters (UTF8,...) for columns of schemas" under Window / Preferences / Talend / Specific Settings.
Neither of the solutions worked for me.
Is there any workaround to solve the problem?
Thanks in advance
Frank
Hello,
Have you tried to add -Dfile.encoding=utf8 in the ini (config file) and restart your studio to see if it works?
Best regards
Sabrina