5 Tips for Replicate task configuration with MS SQL server as Source

Sonja_Bauernfeind · Mar 22, 2021 11:41:59 AM

Environment:

Tip 1: Transaction vs Batch mode

Batch Apply is the default setting because it is much faster than Transactional Apply.

Transactional Apply mode is generally slower than Batch apply mode because the task will replay the source TLog transactions exactly as they are ordered in the source TLog. Each update to a record will be applied to the target one statement at a time.

Batch Apply mode will do many optimizations to the transactions that it reads from the source Tlog. A couple of examples would be the combining of several statements on one table into a single statement. Like the Insert followed by a couple of update statements would be combined into a single Insert Statement, or several Update statements that would be combined. The most import thing that batch will do is apply all the records in the bac=tch as a single command to the target instead of hundreds, if not thousands of individual statements.

Tip 2: Log Levels for task

There is very little impact on performance and task log file size to have the following two components always set to trace level logging.

PERFORMANCE and TARGET_APPLY

With these preset to trace you will have information in the log that could help with troubleshooting latency.

Tip 3: Best practice for tasks during source maintenance window MS SQL

Reindex of a SQL server source presents some issue for replicate tasks that need to be considered when planning the SQL server maintenance window.

The most important impact on a replicate task to realize about an index reorg or rebuild maintenance window is that the task will not be able to be restarted with a timestamp value that goes back to before the reindex job started. The original partition ids will be lost by the task and this will require a full load of any tables that were involved in the re-index.

The next thing to be aware of is that when a source database does a re-index maintenance window that replicate task will build up latency during the reindex job. This latency results from either of the two strategies (1. Stop and Resume task - 2. Keep task running) for handling re-index by a task.

This is all about the index reorgs and how reading backup TLogs is slow

Possible mitigating steps:

Keep the task running during Re index job - not recomended as it is usually very slow to read the TLogs
Stop the task - Resume task after Re index is complete - recommended method
Disable source end users - stop task - do Re index maintenance - restart task with timestamp - after maintence job completes - Enable source end users
Disable backups during maintenance - keep everything in the online TLog
Set task to use SQL direct access - (bFile) no source compression allowed
Use lite speed as the third party backup - we can read it quicker

Additional information contributing to the slowdown:

Internally the t-log transactions are stored by partition id and during some maintenance activities, like rebuilds, the partition id is changed. You can see the list of them that we are looking for in the where predicate on the SQL statement.
Replicate will start reading the t-log backup with the currently known list of partition ids.
Replicate is also looking for the partition change DDL transactions.
Since the list of partitions we are looking for has changed Replicate has to:
1. Stop the currently running query.
2. Define the new list of partitions to look for.
3. Start the query again.
4. Since the t-log backup is not ‘keyed’ it has to do a scan of the entire log to get back to the LSN it was processing. (this is the killer)
5. This will happen each time a partition switch occurs. Which could be many based on the number of tables.
If the task is stopped, it will store a list of partitions that are currently known to disk.
When the task is resumed (not fresh start) then the list is read from disk and then the database is queried to get all the new partition ids.
- That is why you were asked to stop the task and then resume after the maintenance was done, so we could start reading the t-log with all know former and current partition ids.

Tip 4: Transformation SQLite syntax and documentation.

Replicate uses SQLite syntax in the transformation and filter sections of the task table transformations section. We do not maintain documentation on this so if you want to read about a particular supported function or operation please refer to SQLite syntax on-line.

Tip 5: Prevent truncation of unread changes from TLOG

There are two options to choose from when setting up the MS SQL server source endpoint.

Option 1: Start transactions in the database

Note: This option must be used if there are more than one task using the same source database.

The safeguard mechanism opens 2 transactions in the source database to prevent Active Log from truncating changes that we didn't read yet. The reason for that, is that if the Active log will truncate changes while we are reading from a proximate location in the TLog, we might lose those events.

So, the safeguard will open those transactions to prevent truncation.

It opens both a "rear latch" transaction which marks the oldest record that we didn't read, and a "front latch" transaction, that is created in "now" time every time we move the safeguard.

So when we move the safeguard, we commit the rear latch transaction, open a new transaction "now" which will be the new front latch, and the old front latch will become the new rear front latch.

Replicate will move the safeguard if and only if:

Replicate has read the changes past the rear latch position (LSN)
The changes that were read were confirmed by the sorter, meaning that those changes were either applied to target, or saved on disk as swap files. This is in case of a task error, we will not move along and lose changes, that are not on target or saved on disk

The parameter of 5 minutes of safeguard, tells Replicate how often to check if we can move the safeguard.

It doesn't mean that the safeguard will move every 5 minutes, as Replicate may check and see that it didn't read yet past the rear latch, so it cannot move the safeguard.

So, you can see that with the safeguard option, you can potentiality have the long-running transactions on the database, and active log can get large and even fill out.

This is actually normal behavior, as these transactions are left open to ensure the transaction log is not truncated before all changes are read from the source.

Option 2: Exclusively use sp_repldone within a single task

For this option the task hands off the whole 'Safeguard" process to SQL server LogReader agent job.

One big issue with this issue is that when the task is stopped the LogReader will not free up the TLog for truncation. So if a task is down for a couple of days the source TLog will get very large in size.

5 Tips for Replicate task configuration with MS SQL server as Source