Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
karandama2006
Creator

Special characters in csv files , Encoding ANSI , character set Windows-1252

Hi All,

I'm facing problems in loading data from csv file , its encoding is ANSI and character set is windows-1252.

It also has a unusual delimitter , which is the Pilcrow character ¶

I have lot of similar files , I'm able to load most of them , but for couple of files I'm getting an invalid number error (cannot convert to INT)

I found the record which was causing issue , visually I cannot see any special characters..

 

8¶1¶¶¶¶¶¶¶¶¶¶¶10000¶¶¶¶¶¶¶¶¶¶¶¶4¶¶¶¶¶2019-05-06¶4¶

 

But when I run it through Talend , I get below error

 

com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [339]: invalid number: not a valid number string '': type_code=29, index=10not a valid number string '10000': type_code=29, index=12not a valid number string '': type_code=29, index=16
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:977)

 

I tried using replace in all my numeric fields , replaceAll("[^\\x00-\\x7F]", "") 

But it does not help .

If I try to convert the file in UTF-8 on notepad++ , everything is messed up...


Below rows from the same file , load without any issue 

 


1¶0¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Ko ¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00
2¶1¶StringOfLength100¶test¶123456¶ES¶123456789¶0001041090¶2019-12-11 00:00:00.0¶10000¶N¶10000¶10000¶StringOfLength1000¶2018-12-11 00:00:00.0¶23¶1¶2018-12-11 00:00:00.0¶Sha¶1100¶A¶98000¶09¶7¶5¶2018-12-11 00:00:00.0¶AU¶St ame¶12¶2019-05-06¶1¶100.00

 

Any pointers or help ??

0683p000009M6Zb.png0683p000009M6Kx.png

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

    The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?

 

row1.data.replaceAll("[^\\d]", "")

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

View solution in original post

2 Replies
Anonymous
Not applicable

Hi,

 

    The issue is for the column containing 10000 since it is having some padded characters. Could you please try below function to remove them?

 

row1.data.replaceAll("[^\\d]", "")

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

karandama2006
Creator
Author

Thanks that works , I've used below code

row1.data.equals("") ? null : Integer.parseInt(row1.data.replaceAll("[^\\d]", ""))