Hi!
I'm running Talend 5.0.1 and I'm fighting hard to get rid of, or replace, control characters... This is what I have:
1. tMysqlInput reading from a MySQL database with an utf8_general_ci encoding where some of the characters appear as an "em symbol" in the query output
2. tReplace where I'm trying to replace "\u0025" with an emty string ""
3. tMap component
4. tAdvancedFileOutput where the output encoding is set to UTF8
I thought I'd remove the problem by enclose the text with "<!]>" but it didn't help
- Furthermore the tReplace component seems to be unable to replace the single "em" character by looking for "\u0025". If I don't enclose the text with the CDATA directive I get written to the file which causes problems when I try to index the XML in another system...
Hope you're able to help me here because I'm 100% stuck with this...
Many thanks!
Hmmmm, the characters I'm trying to remove are \u0019, \u0025, \u0028 and \u0029 and they're shown as "strange single character" characters
I checked in the original file and it is UTF8-encoded...
Cheers
Hi! I managed to fix a workaround. This is what I did: Using a tMap component, I invoked the 'replaceAll' method on the column causing the problem: <column>.replaceAll("","") I hope that it will become possible to use a similar approach using the tReplace component in the future. Cheers!
Hi,
Couldn't you use the tReplace component with some regular expression that allows standard characters only? I'm not that good with regex, but somthing like allows only alphanumeric characters including spaces.
Hope this helps.
Regards,
Arno
@avdbrink:
hmmm, not sure - I tried to replace something like "\u000c" for example but never got it to work with tReplace... it could be that I provided the parameters wrongly but...
The "workaround" I applied works fine soo I'll stick with that for the time being. Thanks for your suggestion though
Cheers