We’d like to request a feature that provide a simple method to calculate a row checksum for change files.
When using QR to produce change files into a file based target (e.g. GCP GCS, AWS S3, file system), we’d like each row of data to have the last column as a check sum for that row. After we ingest the data into a database target (e.g. our datawarehouse) we would like to calculate the checksum on the ingested row (minus the QR provided checksum) and then compare them to validate that we ingested exactly the same data as what was contained in the source file.
This helps us evidence to regulators that no only did we ingest the correct number of rows (as provided by the DFM file), but we can also guarantee that the data is exactly the same at a column level.
e.g. If we have a comments column in a table that contains special characters or unusual sequences such as : “The customer Bradford Holders ™ is interacting with ‘’ \n its parent company 读写汉字 - 学中文
That we can test that the ingestion process didn’t lose or inadvertently transform any of that data. E.g. In this case maybe on ingestion it looks like ““The customer Bradford Holders is interacting with its parent company” perhaps gets truncated at the \n
So we are looking for the most efficient way to validate the integrity of the ingested data.
The ideal would be for QR to support a single function to append a row checksum without having to specify every column
eg.
hash_sha256(cast(column1 as varchar) || cast(column2 as varchar) || cast(column3 as varchar) || cast(columnx as varchar))
but we haven’t found a way to do that in the docs we’ve read.