Monitoring and re-syncing questions

dba_guy · ‎2020-07-29

Hi,

I have a couple questions.

1. How do people monitor if the data is the same on both source and destination of a replication task? I'm curious what kind of solutions people have come up with.

1. For a table that is out of sync or missing rows at the destination, we typically will reload the single table. For very large tables with 500 million rows that isn't feasible as reloading can take hours. Has anyone done anything in regards to getting the table to sync up without having to do a full reload of the table?

Thank you.

JitenderR · ‎2020-07-29

@dba_guy There is no straight forward or single answer to the below questions. Also creating a solution also depends on how you have 'designed' the task. For example, if your task performs user based filtering from the source, then you need to accommodate that in your solution.

Having said the above, QEM (Qlik Enterprise Manager) provides a wide variety of API's and one set of API's is at Table Level. Please refer to the below post which enables you to know if task is continuously replicating. Similarly you can use other API's to get counts from target tables getting replicating and input them into an Audit table, capture the source level count information and direct to the audit table. A thorough script will be needed to ensure the counts match exactly.

Hope this helps!

https://community.qlik.com/t5/Qlik-Replicate-Discussions/REST-API-Calls-to-Get-Table-Level-Details-o...

Regards

JR

JitenderR · ‎2020-07-29

@dba_guy For question#2, it depends on the endpoint, but data validation still needs to be done by DA's or end users. For instance, if you are sure that last 7 days worth of data is only missing, then you can start the task from timestamp OR run the task in UPSERT mode from a specific timestamp or SCN. Again, various other factors comes in picture viz., availability of the database logs, endpoint supporting start from timestamp etc.

Also, note that loading 500 million records or even a billion record tables should be a weekend only activity if both the source/target endpoints support parallel loads, full load performance tuned for max throughput.

Hope this answers your questions. Let me know for any additional questions.

Regards

JR

Best Practices

Other