Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Talend Community,
I'm designing a simple ETL task like so:
fileDelimitedInput --- tMap ----- tDbOutput .
The job works fine with small CSVs until when we applied real world data, where some columns have invalid values such as ' ' in columns that are supposed to be integers (eg. id) and the job fails.
We're planning to build a simple pipeline that is able to handle such failure circumstances, where the "good" rows are uploaded to a database, and the "bad" rows are output into another similar database table, without "killing" the job.
The problem I'm facing now is that the job dies at 2 cases:
1. Fails at the fileInputDelimited component because the the "id" field of the target schema in fileInputDelmited is integer type
2. Modified the id field of target schema in fileInputDelimeted to "String", and handling the conversion at tMap using expression (Integer.valueOf(row1._id)).
In both cases the job stops after encountering the bad row.
Starting job test_job at 13:06 19/10/2018.
[statistics] connecting to socket on port 3624
[statistics] connected
Exception in component tMap_1 (test_job)
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.valueOf(Unknown Source)
Could someone recommend a simple reference job that catches all kinds of errors (not only encoding / parsing errors), handle them separately, while not killing the job, just move on to the next row for processing?
Thanks!
I would like a general solution where all kinds of errors are captured without failing the job. (eg. some rows have empty id fields), how could that be modelled?
Hi,
For avoiding any type of error we have to check schema using tconvertType and then collect rejected record and main records. Again you can use schema using tschemaComplance component for any type of rejection
Thanks
Kailash Yadav