
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Special characters in csv files , Encoding ANSI , character set Windows-1252
Hi All,
I'm facing problems in loading data from csv file , its encoding is ANSI and character set is windows-1252.
It also has a unusual delimitter , which is the Pilcrow character ¶
I have lot of similar files , I'm able to load most of them , but for couple of files I'm getting an invalid number error (cannot convert to INT)
I found the record which was causing issue , visually I cannot see any special characters..
8¶1¶¶¶¶¶¶¶¶¶¶¶10000¶¶¶¶¶¶¶¶¶¶¶¶4¶¶¶¶¶2019-05-06¶4¶
But when I run it through Talend , I get below error
com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [339]: invalid number: not a valid number string '': type_code=29, index=10not a valid number string '10000': type_code=29, index=12not a valid number string '': type_code=29, index=16
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:977)
I tried using replace in all my numeric fields , replaceAll("[^\\x00-\\x7F]", "")
But it does not help .
If I try to convert the file in UTF-8 on notepad++ , everything is messed up...
Below rows from the same file , load without any issue
1¶0¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Ko ¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00
2¶1¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Sha¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00
Any pointers or help ??
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?
row1.data.replaceAll("[^\\d]", "")
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?
row1.data.replaceAll("[^\\d]", "")
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
row1.data.equals("") ? null : Integer.parseInt(row1.data.replaceAll("[^\\d]", ""))
