Understanding Replicate CDC incoming changes

Alan_Wang · Aug 30, 2022 7:10:57 PM

Replicate Change Processing User Interface

Qlik Replicate's User Interface comes with a set of monitoring tools to help us understand and break down what is happening to the task on all levels. In this article, we will focus on the incoming changes from the CDC (Change Data Capture) process and understand where the changes are and what happens to the changes along the way.

To monitor the incoming changes for the task, we would need to go to the Monitor tab, switch to the Change Processing tab, and click on Incoming Changes.

Incoming Changes.png

From this main screen, we see five vertical bars. Here the bars are separated by the following:

Incoming Changes
Accumulating - In Memory
Accumulating - On Disk
Applying - In Memory
Applying - On Disk

The incoming changes are grouped into two categories:

Accumulating
Applying

Each category is further broken down into two locations:

In Memory
On Disk

Incoming Changes is the total number of changes and transactions that are being read from the source database endpoint. In the picture below, you can see that 12 changes are read from 5 transactions.

Incoming Changes bar.PNG

The two Accumulating bars are a breakdown of the Incoming Changes. Accumulating means the Replicate server is reading the changes from the source database endpoint.

In the picture below, you will see the values set in the task settings that determine if the incoming changes will go to the In Memory bar or the On Disk bar. In this case, the task has been allocated to use a maximum of 1024MB of memory for the task, and the transaction duration is kept for a maximum of 60 seconds. Changes that exceed this memory limit or transaction duration are held in the On Disk bars.

Change Process Tuning values.PNG

Incoming changes that are held In Memory bar can be offloaded to the next step where Replicate will begin the apply process. Incoming changes that are held On Disk bar will wait for available memory before they are placed into the In Memory bar.

Incoming Changes Details.PNG

As a reminder, the total number of changes/transactions in the two Accumulating bars will be equal to the Incoming Changes bar. The two Accumulating bars are just a breakdown of where the changes are currently stored. They are either in memory or on disk waiting for memory to start the processing.

The next step of the process is the two Applying bars. Applying means the Replicate server is gathering the changes on the Replicate server while it waits until the task settings have been reached and the batch of changes are applied to the target database endpoint.

Please refer to the Change Processing Tuning values you have set in the task settings to determine the wait time for the Replicate server before the batch of changes is applied to the target database.

In the picture below, you will see that the task has been set to move a batch of changes to the apply process when a duration of 1-30 seconds has elapsed or the batch of changes has reached 500MB in size. Note that these are just the default values and are by no means the exact value you should set for the task.

Change Process Tuning values.PNG

The In Memory and On Disk bars work the same way as the previous set of In Memory and On Disk bars. Up to 1024MB of memory can be used for the task before changes are stored on disk. As memory is freed up from changes being applied to the target, the changes will be moved from the On Disk bar to the In Memory bar which is then applied to the target in the next batch of changes.

Questions & Answers

When are changes captured from the source?

Changes are captured when the task is in a running state. The task will have a green circle with a white triangle icon. The task can also be in a stopped state with a grey circle and white square icon or an error state with a red circle and white X icon. The Stopped and Error state tasks will not capture changes anymore.

Running state.PNG Error state.PNG Stopped state.PNG

What happens to the changes when I stop the task in midst of the replication process?

Replicate will immediately stop capturing any more changes from the source database. Replicate will continue to apply changes to the target database for up to 30 minutes. After 30 minutes have passed, any remaining changes will be left in the Applying - On Disk bar. This means that changes are stored on the Replicate server disk. The remaining changes will be applied the next time the task is resumed.

What happens to the changes stored on the Replicate server disk if I choose to use the advanced run options to start the task from a timestamp instead of resuming the task?

If the "Start from a timestamp" option is used instead of the "Resume" option, all changes that are still stored on the Replicate server disk will be deleted. Replicate will start reading changes again from the chosen timestamp.

Will Replicate know which changes have not been applied to the target database or is there a chance Replicate will read the same changes twice?

Replicate uses an internal variable called stream position to keep track of changes that have been read. The stream position consists of the record position within the source database which lets Replicate know the exact location of the last change read. The stream position also lets Replicate know the exact record position that has been applied to the target. This allows Replicate to stop and resume without duplicating any source reads or target apply processes.

If I use the same source endpoint in another task, will the incoming changes count CDC from the other task too?

No, the monitoring tool UI that you see for the task only contains information pertaining to the single task you are viewing. Each task has its own separate set of information regarding the changes for the task.

Environment

Qlik Replicate

The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.

nareshkumar · ‎2024-12-12

Hi Alan/Dana,

If my process is hung due to accumulating Disk with over 1.3 Billion records. Data is not flowing to Target .. how can we process the data into target ... is there any way to catchup massive transactions which are in accumulating Disk...

Dana_Baldwin · ‎2024-12-12

Hi @nareshkumar

There can be many reasons why data is not flowing. It would be best to open a support case and provide a diagnostics package and enhanced log files in order to troubleshoot the issue.

a. On the task's Monitoring tab, Tools drop-down menu, select Log Management.
b. On the screen that opens, please ensure the box "Store trace/verbose logging in memory, but if an error occurs write to the logs" is NOT checked. Scroll down to the following items and set them over one position to the right, Trace. Click OK.
c. The change will take effect immediately (no need to stop/resume the task).
d. The task log will be fairly large so limit this to only 20-30 minutes before returning these logging levels back to Info, then zip the log file (not in a diagnostics package as the log will be truncated) and upload it to this case.
PERFORMANCE
SOURCE_CAPTURE
TARGET_APPLY

Understanding Replicate CDC incoming changes