Problem with UTF-8 "fixed record" files

Ask a Question

Hello,

I have data in a text file, encoded in UTF-8 (w/o BOM), with fixed records.

When the file contains a special character, this one is considered as 2 characters, and all the following data is parsed wrong (with a shift).

This file:

BRAND MODEL DATE VALUE

Audi A3 20140101abcdefgh

Audi A4 20140202abcdefgh

Audi Coupé 20140303abcdefgh

loaded with QlikView:

Data:

LOAD @1:16 AS BRAND,

@17:35 AS MODEL,

@36:43 AS DATE,

@44:n AS VALUE

FROM

test.csv

(fix, utf8,header is 1 lines);

will give me a wrong DATE for the last record: "2014030" instead of "20140303", because the "é" of "Coupé" will count as 2 characters.

And it's VALUE will be "3abcdefgh" (with a "3" that should not be there).

If I convert the same file in ANSI, I don't have the problem.

(please, don't answer me "so, convert the file in ANSI" )

0 Replies