[resolved] Save rejected rows

Loneliness · ‎2014-09-24

Hi everybody, i am reading from a delimited file and writing the output in a table. The source file has this schema :
a:varchar
b:integer
c:date
c:varchar
Some of the rows have more fields than the expected and they should be rejected, what i need to do is to save these rejected rows somewhere. Can i achieve that without knowing before how many fields the row has?
Than you all

Anonymous · ‎2014-09-24

Hi,

Some of the rows have more fields than the expected and they should be rejected,

Could you please elaborate your case with an example with input and expected output values? Are your looking for TalendHelpCenter:tSchemaComplianceCheck which helps to ensure the data quality of any source data against a reference data source.
Best regards
Sabrina

Loneliness · ‎2014-09-24

Hi Sabrina, as you see in the screenshot, my file has 2 fields: region_id: integer and region_name:varchar.
The last 2 rows of the file have 3 fields and what i want to do is to save the rejected rows in all cases. tSchemaComplianceCheck would check types, nullability and length of rows against reference values but what if in my file i have a row that has the 2 first fields correct but it has other fields? tSchemaComplianceCheck will not detect this row. I want the third row of my file (3,b,c) to be rejected and saved somehow...is it possible?
Thank you

Anonymous · ‎2014-09-24

Hi,
You can set a length limitation for your column "Region_name" to avoid your third row of file (3,b,c).
See my pic, feel free to let me know if it is Ok with you.
Best regards
Sabrina

Loneliness · ‎2014-09-24

Thank you Sabrina for your reply, actually, this resolve only in part my problem, i still need to save the whole rejected row in my output, including the additional fields if any, and your solution only show the first 2 fields because the source file schema has only 2 fields.
For example: the row (3,b,c) is correctly rejected, but in the rejected output i only have:
|=---+----+-------------------------------
|code|name|errorMessage
|=---+----+-------------------------------
|3 |b | name.length() > 1 failed
so i loose the third field (c)...
Thanks again

Anonymous · ‎2014-09-25

Hi,
For your input data:
1,Europe
2,Asia
a,b,c
3,b,c

If you want to use "," as field separator, you need add a column filed 3 for your third field (c)...
See my pics
Best regards
Sabrina

Anonymous · ‎2014-09-26

Hi cheaito,

Any update for your issue? Is this solution resolve your issue?
Best regards
Sabrina

Loneliness · ‎2014-09-26

Hi Sabrina, actually no...what i wanted to achieve is the following:
I define the file schema as 2 fields Field1: int; Field2: String
For some reason, the source file has some wrong rows that do not respect this schema and these rows have more fields (more separators)
I want my job to reject these rows and to show me the rejected row entirely and not only part of it!
I can't modify the schema definition of the source file because it should have only 2 fields and i want the job to reject those rows, if any, that have more fields and in my output i want to see something like this:
The row: "3,b,c" was rejected because it has more fields than the expected!
Thank you again

Talend Data Integration

v5.x