
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tSchemaComplianceCheck : type check doesn't work ?
Hello everyone,
I'm discovering tSchemaComplianceCheck with a simple test job :
Here is the tFileInputDelimited schema :
tSchemaComplianceCheck :
The input delimited file :
However, here is what I get when I run the job :
I was expecting to find my D2 line in the tLogRow2 as COL3 is not an Integer.
Or I didn't get how tSchemaComplianceCheck component works ?
Thank you in advance 🙂
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This can be a bit confusing to start with. The problem you have here is that you are setting the class of the Integer column at the tFileInputDelimited point. This component will read the data in (flat file data is essentially a String) and will convert (or try to convert) the data to String, String and Integer columns. If the conversion is not possible, it will fail at this point. As you can see from the error at the top of your log, this column is unable to be converted by the tFileInputDelimited component, therefore it fails the row.
To test the tSchemaComplianceCheck, you would need to read everything in your flat file in as a String. The assumption is that at this point, you may not know that the content of the file is correct. There are often errors. So you read it in as Strings, then you use the tSchemaComplianceCheck to ensure that the data meets the expected schema before you then convert it.
So, for the following data....
col1; col2; col3
aa; bb; 1
cd; hj; 3
df; gh; t
....col1 is a String, col2 is a String and col3 is meant to be an Integer. However ALL data is retrieved as Strings (the safest data type for files). Then we would connect it to a tSchemaComplianceCheck component which is expecting col1 to be a String, col2 to be a String and col3 to be an Integer. If we connect the tSchemaComplianceCheck to two tLogRows (as you have done), we would see the following result....
tLogRow1
aa|bb|1
cd|hj|3
tLogRow2
df|gh|t|2|newColumn2:wrong type

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This can be a bit confusing to start with. The problem you have here is that you are setting the class of the Integer column at the tFileInputDelimited point. This component will read the data in (flat file data is essentially a String) and will convert (or try to convert) the data to String, String and Integer columns. If the conversion is not possible, it will fail at this point. As you can see from the error at the top of your log, this column is unable to be converted by the tFileInputDelimited component, therefore it fails the row.
To test the tSchemaComplianceCheck, you would need to read everything in your flat file in as a String. The assumption is that at this point, you may not know that the content of the file is correct. There are often errors. So you read it in as Strings, then you use the tSchemaComplianceCheck to ensure that the data meets the expected schema before you then convert it.
So, for the following data....
col1; col2; col3
aa; bb; 1
cd; hj; 3
df; gh; t
....col1 is a String, col2 is a String and col3 is meant to be an Integer. However ALL data is retrieved as Strings (the safest data type for files). Then we would connect it to a tSchemaComplianceCheck component which is expecting col1 to be a String, col2 to be a String and col3 to be an Integer. If we connect the tSchemaComplianceCheck to two tLogRows (as you have done), we would see the following result....
tLogRow1
aa|bb|1
cd|hj|3
tLogRow2
df|gh|t|2|newColumn2:wrong type

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you rhall I understood and everything is ok !
In this case, right after the tSchemaComplianceCheck it's necessary to use a tConvertType to convert everything from String to the real types we want.
