Re: Validate Schema of Entire File - Not Just Row ... - Qlik Community

ml1662663516 · ‎2023-01-12

We are reading in a file using tFileInputDelimited and are validating schema column formats using tSchemaComplianceCheck.

We see we can use main and reject to check each row of the file.

However, we do not want to processes only the good rows.

We would like to set the job up to only process the entire file if there are no bad rows. Kind of an all-or-nothing type scenario.

Would anybody have any ideas on the components and flow we could use to perform such a scenario?

Thank you!

Anonymous · ‎2023-01-12

There is no component (that I am aware of....new components arrive all the time) to check the whole file. But what you could do here is carry on checking line by line and store the output in memory. If you have any rejected rows, you will not proceed and work on the file. If you do not get any rejected rows, then you would process the data you have kept in memory.

To do this, you could use the tHSQLDB components (https://help.talend.com/r/en-US/7.3/hsqldb/hsqldb) or the Hash components (tHashInput - https://help.talend.com/r/en-US/8.0/technical/thashinput and tHashOutput - https://help.talend.com/r/en-US/8.0/technical/thashoutput).

ml1662663516 · ‎2023-01-13

Thank you for your response.

We did look into using the components you mentioned.

However, our ultimate solution was using the tFileInputRegex component and then splitting off using the main connector and the reject connector and incrementing a global variable counter when a bad match is found.

We branch off from that subjob with an onSubjobOk and the check the global variable and proceed with an if connector.

I don't know if that is the most efficient workflow, but it does seem to be working correctly and reliably.

Thanks!

Anonymous · ‎2023-01-13

So long as it suits your requirements 🙂

Validate Schema of Entire File - Not Just Row by Row

Talend Big Data

v7.x