When using the tFileInputDelimited component to parse large text files, I encounter parse exceptions on some lines but not all.
For example, this date cannot be parsed: "2013-10-01 00:00:57.8501" using pattern "yyyy-MM-dd HH:mm:ss.SSSS". Which is strange as the pattern matches.
However, when I parse the date using custom Java code in the tMap component, as shown below, it works perfectly:
new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSS").parse(row1.timestamp);
This is the error thrown:
Exception in component tFileOutputDelimited_1 java.text.ParseException: Unparseable date: "2013-10-01 00:00:57.8501" at java.text.DateFormat.parse(DateFormat.java:357) at vex_poc.loadvps_0_1.LoadVPS.tMSSqlInput_1Process(LoadVPS.java:1657) at vex_poc.loadvps_0_1.LoadVPS.runJobInTOS(LoadVPS.java:2795) at vex_poc.loadvps_0_1.LoadVPS.main(LoadVPS.java:2660)
On two million rows of data the following date parsing errors were shown:
Exception in component tFileingOutputDelimited_1 java.text.ParseException: Unparseable date: "2013-10-01 00:00:57.8501"
Are you reading the data from a text file or trying to write the data into a file? Why the exception is occurring on tFileOutputDelimited? Can you upload a screenshot of the job? It will be helpful for us to find out the problem.
Shong
Hi again,
Thanks for your help!
I found the problem in the end. The data contained several non-printable characters at the start of some lines in the dataset. I only found these characters on Linux using 'cat --show-nonprinting'. During my first work on Mac OS I couldn't see them for some unknown reason.
Glad it's solved now, my bad. Doesn't the trimming function in Talend remove these?
52737 2013-10-01 23:59:21.9805 608197 VPS 2701 372474^M
52738
M-oM-;M-?2013-10-01 00:01:09.4939 175144 VPS 2700 372654^M
52739 2013-10-01 00:02:06.0605 195582 VPS 2726 372558^M
52740 2013-10-01 00:02:06.8094 310077 VPS 2726 372549^M
Cheers,