Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm having trouble gracefully handling changes in my Excel input during job execution. I have numerous monthly reports to ingest but the columns included in the reports is not static (they change between months). To cope with this I have used tSchemaComplianceCheck to validate the main flow after tFileInputExcel with 1 compliance check for each schema type I encounter in testing. My problem is the tFileInputExcel also has an input schema associated with itself so some rows are being rejected at this stage before they get into my schema checking stage.
What is best practice when dealing with changeable inputs during a job? and is it possible to stop Excel performing any checks so that at least all my rows pass through my schema compliance check
Hi,
How many schemas have you got to compare?
What is the maximum number of columns? What is the max data length possible in any column?
Consider you have 10 columns max and 50 is the max length
create a metadata with 10 columns with 50 character length and string as a data type.
you may load that into a thashoutput and use thashinput everytime to compare in each schema type
Cheers!
Gatha
Hi,
I had a similiar situation.
I have created reject row output on tFileInputExcel and pointed to reject file once again.
Because some of the data cannot be read by Talend
like floating number data and the data type is Integer, this will get rejected at
This will get rejected at tFileInputExcel itself.
I hope it helps
At the moment my process looks like this.
I read the Excel file with schema 1 applied then compare to the same schema. If there is errors (row40) I can filter by error code at attempt to process the file using a different schema. My problem is not all rows are submitted through the schemaCompliance component as rows get rejected (row1).
I'd love to have only 1 reject row where I can capture all ingested rows that didn't fit schema 1 and then compare those against schema 2 and if correct them process the file against schema 2 instead. How does my setup compare to yours @gatha_vdm
Talend does a basic schema check at tFileInputExcel.
Because it tries to store the data in the respective data type.
Option1
Create a metadata with all the column as a string without any not null constraint.
In the next stage, you may convert them to the respective data type(The data might fail in this step itself.).
Option 2
Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck
Cheers!
Talend does a basic schema check at tFileInputExcel.
Because it tries to store the data in the respective data type.
Option1
Create a metadata with all the column as a string without any not null constraint.
In the next stage, you may convert them to the respective data type(The data might fail in this step itself.).
Option 2
Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck
Cheers!
Should "enabling" the "Die on Error" for the tFileInputExcel mean all my rows from the excel file will get sent to the next componment (tSchema) as there isn't a "Reject" flow to handle?
Hi,
if there is a possible record can go into reject flow then the program might fail if you choose to die on error.
Cheers!
Gatha
Hi,
How many schemas have you got to compare?
What is the maximum number of columns? What is the max data length possible in any column?
Consider you have 10 columns max and 50 is the max length
create a metadata with 10 columns with 50 character length and string as a data type.
you may load that into a thashoutput and use thashinput everytime to compare in each schema type
Cheers!
Gatha
I don't have that many (2 or 3 per file type) but not sure its scalable if the number of schema increases. Would your suggestion not cause problems when a columns contents are shorter than max? i.e. could 2 columns be detected as 1 ?