Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
We are using Replicate to move data from our operational systems like Oracle, SQL Server etc. to our cloud based data warehouse. We run Replicate 24x7 to get a realtime data ingestion to warehouse.
We are looking to implement process integrity checks to determine if we received all the data from source to target.
Has anyone implemented solutions to solve this issue?
Thanks!
Hi. We are also at a place where this is becoming needed. Did you end up finding a solution to this or still looking? I was thinking of maybe having to build Qlik sense Apps that compare the Landing to the Warehouse layer, or something like that 🙂
There are no hard and fast solutions for this.
Several folks run count on source and target and compare. This is best done with filter perhaps on 'last update' to ask for specific time window to avoid latency effects.
Others just watch max('last update') for a known-to-be-have-frequent-changes table if they have such column to see if they are sufficiently in sync. Some even run CRC checks on concatenated row columns to see if not only all rows are there, but are identical.
It's all a mess, and rarely solid. Best thing to do is trust the process!. Of course you trust and verify some. Check those reptask logs for errors; check the attrep_apply_exceptions table; check coarse change counts against expectations; and may after that add a count rows on source and target.
There is one thing I like for confidence and (performance) monitoring:
This table is end-to-end: no need to ask Replicate whether all was well!
It should get 1440 rows per day per task (assuming 1 minute interval). It allows you to plot or report avg/min/max end-to-end latency over multiple days.
hope this helps,
Hein