Re: Export to UTF-8 CSV using Data Preparation Too... - Qlik Community

Anonymous · ‎2017-10-21

Hello

We are using the Data Preparation Free tool to manage excel file that will then be imported into a Marketing Automation tool called Marketo.

Unfortunately, when we export the data from the data preparation tool usin CSV UTF-8 encoding, any Latin characters that have accents or double dots on top for example are all coming out incorrectly. When we do the same thing with MS Excel all come out fine.

What it is that we are doing wrong when exporting data from the data preparation free tool?

Thanks for the help

Anonymous · ‎2017-10-23

Hi,

It seems surprising at first glance ... especially knowing that Data Prep only supports UTF-8 when exporting to CSV.

Therefore, few questions:

Can you clarify in which tool you have faced the encoding issue when opening the CSV file generated by Data Prep?
Could you share a sample of the source file and of the resulting file exported from Data Prep? It can of course be dummy data instead of your real data, as long as we can witness the issue.

Thank you

Anonymous · ‎2017-10-24

Hi Gwendal

Thanks for replying to quickly.

We use Talend Data preparation Desktop Free tool. Then open the file in MS Excel 2016.

Here are two examples

Company	First Name	Last Name	Function Title	Address	Postal Code	Post Office	Country
ABC	John	Smith	CIO	StephensonstraÃŸe 1	12345	Frankfurt a. Main	Germany
XYZ	JÃ¶rg	Munch	CDO	ArabellastraÃŸe 4	12345	MÃ¼nchen	Germany

Hope this helps

Thanks

Axel

Anonymous · ‎2017-10-24

Hi Axel,

That is what I assumed ... it's actually an Excel issue: Excel doesn't automatically detect encoding of CSV files and always assumes it is Windows-1252 ... hence the issue you see when opening the CSV file generated by Data Prep. If you open the CSV file in Notepad++ or any other decent text editor, you will see that the file generated by Data Prep is well encoded in UTF-8.

Note that you can display correctly a file encoded in UTF-8 in Excel ... it's everything but user-friendly, though. See https://www.youtube.com/watch?v=GcYt1mJbwk4 for instance (it is not on Excel 2016 but the sequence is essentially the same). And it has been this way for ever ... so it is unlikely to be fixed by Microsoft.

Good news: in the next Data Prep release (planned for January 2018), you will be able to select the encoding when exporting to CSV. So you'll be able to export in Windows-1252 so that Excel can read it correctly natively.

Hope this helps,

Gwendal

Anonymous · ‎2017-11-24

Good news: in the next Data Prep release (planned for January 2018), you will be able to select the encoding when exporting to CSV. So you'll be able to export in Windows-1252 so that Excel can read it correctly natively.

Hello @gvaznunes,
I have the same exact problem in Talend Open Studio, do you know if there is another way to solve the problem there?

Thank you and sorry if I wrote in this conversation about it!

Anonymous · ‎2017-11-25

Hi,

Encoding is configurable in tFileOutputDelimited ... in the "Advanced" tab:

So just select the appropriate encoding here ... and job done!

Regards,

Gwendal

Anonymous · ‎2017-11-27

@gvaznunes, thank you for your help, I knew already about that, but the problem is that I have already set the encoding and I'm still having problem with some characters.

I tried to change the type of encoding from UTF-8 to the others but I still have the problem....

The source is coming from an Oracle DB with codification CP-1252 (so windows 1252) and the issues are related to characters like Ä / Ü and similar.....

Do you have some idea on how to resolve this?

Anonymous · ‎2017-12-07

@gvaznunes, @abaran hi guys!

Do you have any advice to solve this problem?

I will write down some of the words that are causing me this problem (on the left the right word, on the right what i find in the csv after the extraction):
- Querétaro   QuerÃ©taro
- Cuautitlán   CuautitlÃ¡n
- Zürich   ZÃ¼rich
- KLÖCKNER   KLÃ–CKNER
- Chéraga   ChÃ©raga

Right now the field on the left is the one extracted directly from the table (inside a csv file) from the ORACLE DB with codification cp-1252, and the field on the right is the one extracted from the talend procedure with codification UTF-8 and also with the custom codification CP-1252....

Just a little bit of background about my Talend's Job.... Before the extraction to the csv file i use a REST call (tRESTclient) via Application Server to fill the table on the destination database. The data are stored with the correct encoding on the source db, but when i check on the destination db the data are already stored incorrectly....
Is it possible that there is a different encoding on the Application Server?

Thank you!

Anonymous · ‎2017-12-14

Hello,
I'm here again, always with the same issue....
I checked in the details what happen in my job and i saw the the REST call on the Application Express works properly, the data extracted are in the correct encoding.

I was looking around on the web and I saw that a lot of people have this issue, as you can see in the following link https://community.talend.com/t5/Design-and-Development/UTF-8-ANSI-problem/td-p/190040 .

How can we solve this?

It seems that all is related to this problem with encoding in UTF8 http://www.i18nqa.com/debug/utf8-debug.html#dbg , but i cannot find a way to solve this issue...
I hope that someone can help me.

Thank you.

Anonymous · ‎2017-12-18

Hello,
i figured out to resolve partially my problem. I've added two parameters inside my job:
- one parameter to the tOracleOutput component in the advanced Option : "characterEncoding=UTF-8"

- one parameter inside the execution section of the Main Job: -Dfile.encoding=UTF-8.

Now I'have just one issue left as the characters - and ' are inserted on my Oracle db as ? .

Do you have any idea on how to resolve this last issue?

Thank You.

Export to UTF-8 CSV using Data Preparation Tool Free

Data Prep

v6.x