Solved: Special characters in csv files , Encoding ANSI , ... - Qlik Community

karandama2006 · ‎2019-07-12

Hi All,

I'm facing problems in loading data from csv file , its encoding is ANSI and character set is windows-1252.

It also has a unusual delimitter , which is the Pilcrow character ¶

I have lot of similar files , I'm able to load most of them , but for couple of files I'm getting an invalid number error (cannot convert to INT)

I found the record which was causing issue , visually I cannot see any special characters..

8¶1¶¶¶¶¶¶¶¶¶¶¶10000¶¶¶¶¶¶¶¶¶¶¶¶4¶¶¶¶¶2019-05-06¶4¶

But when I run it through Talend , I get below error

com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [339]: invalid number: not a valid number string '': type_code=29, index=10not a valid number string '10000': type_code=29, index=12not a valid number string '': type_code=29, index=16
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:977)

I tried using replace in all my numeric fields , replaceAll("[^\\x00-\\x7F]", "")

But it does not help .

If I try to convert the file in UTF-8 on notepad++ , everything is messed up...

Below rows from the same file , load without any issue

1¶0¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Ko ¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00
2¶1¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Sha¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00

Any pointers or help ??

Anonymous · ‎2019-07-12

Hi,

The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?

row1.data.replaceAll("[^\\d]", "")

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

View solution in original post

Anonymous · ‎2019-07-12

Hi,

The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?

row1.data.replaceAll("[^\\d]", "")

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

karandama2006 · ‎2019-07-15

Thanks that works , I've used below code

row1.data.equals("") ? null : Integer.parseInt(row1.data.replaceAll("[^\\d]", ""))

Special characters in csv files , Encoding ANSI , character set Windows-1252

Big Data

v7.x