Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
mobmsc2
Creator
Creator

How to Handle Validating Excel Input where input format/schema changes during iteration

I'm having trouble gracefully handling changes in my Excel input during job execution. I have numerous monthly reports to ingest but the columns included in the reports is not static (they change between months). To cope with this I have used tSchemaComplianceCheck to validate the main flow after tFileInputExcel with 1 compliance check for each schema type I encounter in testing. My problem is the tFileInputExcel also has an input schema associated with itself so some rows are being rejected at this stage before they get into my schema checking stage. 

 

What is best practice when dealing with changeable inputs during a job? and is it possible to stop Excel performing any checks so that at least all my rows pass through my schema compliance check 

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi,

 

How many schemas have you got to compare?

What is the maximum number of columns? What is the max data length possible in any column?

Consider  you have 10 columns max and 50 is the max length

 

create a metadata with 10 columns with 50 character length and string as a data type.

 

you may load that into a thashoutput  and use thashinput  everytime to compare in each schema type

 

Cheers!

Gatha

 

View solution in original post

10 Replies
Anonymous
Not applicable

Hi,

I had a similiar situation. 

I have created reject row output on tFileInputExcel and pointed to reject file once again.

Because some of the data cannot be read by Talend

like floating number data and the data type is Integer, this will get rejected at

This will get rejected at tFileInputExcel itself.  

 

I hope it helps

 

mobmsc2
Creator
Creator
Author

At the moment my process looks like this. 

I read the Excel file with schema 1 applied then compare to the same schema. If there is errors (row40) I can filter by error code at attempt to process the file using a different schema. My problem is not all rows are submitted through the schemaCompliance component as rows get rejected (row1).

 

I'd love to have only 1 reject row where I can capture all ingested rows that didn't fit schema 1 and then compare those against schema 2 and if correct them process the file against schema 2 instead. How does my setup compare to yours @gatha_vdm

0683p000009Luqi.png

Anonymous
Not applicable

Talend does a basic schema check at tFileInputExcel.

Because it tries to store the data in the respective data type.

 

Option1

Create a metadata with all the column as a string without any not null constraint. 

In the next stage, you may convert them to the respective data type(The data might fail in this step itself.). 

 

Option 2

Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck

 

Cheers! 

Anonymous
Not applicable

Talend does a basic schema check at tFileInputExcel.

Because it tries to store the data in the respective data type.

 

Option1

Create a metadata with all the column as a string without any not null constraint. 

In the next stage, you may convert them to the respective data type(The data might fail in this step itself.). 

 

Option 2

Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck

 

Cheers! 

mobmsc2
Creator
Creator
Author

Should "enabling" the "Die on Error" for the tFileInputExcel mean all my rows from the excel file will get sent to the next componment (tSchema) as there isn't a "Reject" flow to handle?

Anonymous
Not applicable

Hi,

 

if there is a possible record can go into reject flow then the program might fail if you choose to die on error.

 

Cheers!

Gatha

 

mobmsc2
Creator
Creator
Author

Pity, the solution I have appears to need to iterate through the entire file even when a read-error is detected in tFileInputExcel before the step where I compare it to a different schema can start.
Anonymous
Not applicable

Hi,

 

How many schemas have you got to compare?

What is the maximum number of columns? What is the max data length possible in any column?

Consider  you have 10 columns max and 50 is the max length

 

create a metadata with 10 columns with 50 character length and string as a data type.

 

you may load that into a thashoutput  and use thashinput  everytime to compare in each schema type

 

Cheers!

Gatha

 

mobmsc2
Creator
Creator
Author

I don't have that many (2 or 3 per file type) but not sure its scalable if the number of schema increases. Would your suggestion not cause problems when a columns contents are shorter than max? i.e. could 2 columns be detected as 1 ?