Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have requirement to create csv file from hive table while creating csv file, which is comma separated.
while inserting rows talend is inserting special characters in string.
for example
source : Sliver 2 – Sell
target :
Sliver 2 – Sell |
please help how to remove this special character.
Hi @prasad_nayani ,
Could you please change the Encoding to UTF-8 in tfileoutputdelimited component in advanced settings and let us know if that helps.
Regards,
Pratheek Manjunath
I applied encoding as UTF-8 but its actually inserting more special chars instead of removing it.
Sliver 2 – Sell |
I want to see data like my source
Sliver 2 – Sell
I tried to read the String you had given Sliver 2 – Sell from a csv file and I am writing the data back to another csv file.
Our assumption is that you are using UTF-8 encoding while reading and writing the files (need to updated in the advanced section of Talend components). Even if you are using Hive, you will have to check the underlying Hadoop files.
Now, lets assume that the input data is in correct format. If you are trying to print the data in Talend, it will show like below.
The reason is that Talend is using Courier Font for log printing. But if you write the data to a file, you can see that it is having the data as shown below.
The above data is output from notepad after running the job below.
If you copy the data and put it to a MS word, you can see the data in original format (like the font in this post)
So I believe as long as you are maintaining the UTF-8 encoding which is also called Unicode encoding, you should be fine. Only in very rare occasions, you may need UTF-16 encoding but all those encoding can be added by selecting Custom language encoding in the Talend components.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
hi @nthampi,
thanks for you effort but,
Actually I want to remove char € from the output because my source doesn't has special char in it as below.
source
Sliver 2 â“ Sell
but target .csv file is inserting that special char.
Sliver 2 – Sell
Ok. Your earlier posts had Euro symbol in both source and target. So I added it in the source data.
I copied your new data Sliver 2 â“ Sell into the job and printed the output. I did not get any extra Euro symbol and the output is as shown below. It could be due to UTF-8 settings in your environment.
Could you please double check all your job settings and see all the underlying Hadoop files once again?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂