Solved: Re: Patching missing data in Qlik Replicate - Qlik Community

NakulanR · ‎2024-06-07

Hi Qlik Support,

Is there a recommendation on how to patch missing data in Qlik Replicate?

We have a scenario where data is missing from the target for a particular time frame (e.g. 6th June 10:35AM - 11:00AM). This is because when restarting the task from a timestamp, the incorrect timestamp was used, which in turn caused source data to be skipped.

Is it possible to only replay the changes in a specified time frame? And if so, is there a best practice method or considerations that need to be taken into account when doing so?

Apologies if this idea has already been addressed in a previous community post.

Thanks,
Nak

john_wang · ‎2024-06-07

Hello Nak, @NakulanR

How are you?

We can configure the Replicate task to start reading changes from a specified timestamp. Additionally, the task can be set to automatically stop once the source change record's commit time reaches a specific timestamp, as shown below:

repctl execute task=SQLtoSQL 2 flags=1 cdcposition="timestamp:2024-06-06T00:10:30.450" stop_at="commit_time:2024-06-07T10:10:30.450"

Please run careful acceptance test in lower environment before run it in PROD system.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

Heinvandenheuvel · ‎2024-06-07

The 'proper' solution is a reload - this weekend?

It is tempting to 'fill in' the blanks by cloning the task to change the error handling to 'update if exists', start at the earlier time and stop when the latency is an hour less than at that start. The problem with that is when there was an insert and an update for the same row a few hours later. In that case the insert would be stale data. Ditto for an update which was later updated.

Therefor once you re-start 'tables already loaded' by that earlier time you have to let it go to 'now' and you probably want to set error handling for "Duplicate key when applying INSERT:" temporarily to IGNORE to avoid rows resetting to the first version. For Updates you cannot really tell whether there was later update so they will have to be made. You could temporarily stretch the CDC Tuning to min 29 seconds (max for UI) and max 3600 or some such. Once the restarted task has low latency, you can stop it, reset the changes and resume.

fwiw, Hein,

View solution in original post

DesmondWOO · ‎2024-06-07

Hi @NakulanR ,

Thank you for reaching out to the Qlik Community.

If your transaction log files are still available, you can resume the task from the specified time. Keep in mind that if some data has already been transferred to the target, duplicate records may be inserted if your target table does not have a primary key or unique index.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

SushilKumar · ‎2024-06-07

Hello @NakulanR

Thank you for reaching out to the Qlik Community.

There are few things you may consider. if identified the tables then you may need to Re-add the table again which will sync the data post full load.

if the tables are more then would suggest Reload the task as it will take care of mismatch

if you are using oracle, you can mine the redo logs with log miner and export SQL statement and replay them on the target endpoint.

replay and start from back timestamp are prone to Data mismatch and Duplicate data.

Re-add identified table or task full reload is option is recommended to avoid data missmatch.

Regards,
Sushil Kumar

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

john_wang · ‎2024-06-07

Hello Nak, @NakulanR

How are you?

We can configure the Replicate task to start reading changes from a specified timestamp. Additionally, the task can be set to automatically stop once the source change record's commit time reaches a specific timestamp, as shown below:

repctl execute task=SQLtoSQL 2 flags=1 cdcposition="timestamp:2024-06-06T00:10:30.450" stop_at="commit_time:2024-06-07T10:10:30.450"

Please run careful acceptance test in lower environment before run it in PROD system.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Dana_Baldwin · ‎2024-06-07

Hi @NakulanR

Another approach would be to set a filter on the task. This is only feasible if you have a column in each of the tables with a datetime value you could use. Please refer to this knowledge article: Filter for last 90 days of data in Qlik Replicate - Qlik Community - 1880761

Thanks,

Dana

Heinvandenheuvel · ‎2024-06-07

The 'proper' solution is a reload - this weekend?

It is tempting to 'fill in' the blanks by cloning the task to change the error handling to 'update if exists', start at the earlier time and stop when the latency is an hour less than at that start. The problem with that is when there was an insert and an update for the same row a few hours later. In that case the insert would be stale data. Ditto for an update which was later updated.

Therefor once you re-start 'tables already loaded' by that earlier time you have to let it go to 'now' and you probably want to set error handling for "Duplicate key when applying INSERT:" temporarily to IGNORE to avoid rows resetting to the first version. For Updates you cannot really tell whether there was later update so they will have to be made. You could temporarily stretch the CDC Tuning to min 29 seconds (max for UI) and max 3600 or some such. Once the restarted task has low latency, you can stop it, reset the changes and resume.

fwiw, Hein,

Patching missing data in Qlik Replicate

Best Practices