Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
The objective of this document is to provide an overview of the SQL Server replication techniques available for Qlik Replicate and assist the reader to determine the best approach for a particular use case.
Replication is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases to maintain consistency. There are two replication methods supported by Qlik Replicate – MS-REPLICATION based and MS-CDC.
Qlik CDC based on the MS-REPLICATION method leverages the transactional replication mechanism available in the SQL Server. Transactional replication is implemented by the SQL Server Snapshot Agent, Log Reader Agent, and Distribution Agent. The Snapshot Agent prepares snapshot files containing schema and data of published tables and database objects, stores the files in the snapshot folder, and records synchronization jobs in the distribution database on the Distributor.
In the transactional replication the Log Reader Agent monitors the transaction log of each database configured for transactional replication and copies the transactions marked for replication from the transaction log into the distribution database, which acts as a reliable store-and-forward queue. The Distribution Agent copies the initial snapshot files from the snapshot folder and the transactions held in the distribution database tables to Subscribers.
Having said that, Qlik replicate does not explicitly use any of the MS-Replication mechanisms (neither agents nor the distributor), but rather leverages the SQL Server publications and articles, to write additional data about DML and DDL operations into SQL Server TLOG. This information will be later read and then parsed by Qlik Replicate to be distributed to the target endpoint of your choice.
Transactional logs can be read using several Change Processing Modes:
If Replicate cannot find the changes in the backup transaction logs, it will look for them in the online transaction logs instead.
There are few different methods available in Qlik Replicate to read the SQL Server Transaction logs:
Please review Qlik Replicate documentation, chapter “Using Microsoft SQL Server as a source” à “Setting advanced connection properties”
The image below depicts a subset of MS-REPLICATION which is utilized by Qlik Replicate. This article does not cover the complete MS-REPLICATION process.
In Qlik’s specific architecture, a database will be configured for replication using the standard Studio Wizard. As part of this configuration, the distribution database will be created on the publisher instance. When Qlik Replicate will set up the publication and the articles, it will also add an always-false filter (1=0) to prevent any changes be written into the Distribution Database. The Distribution Database will however contain stored procedures and settings used to allow the replication. Log Reader Agent will read the transactional log and mark the activity as replicated to allow active TLOG truncation.
As a next step Qlik replicate will access the log either using internal SQL Server functions (fn_dblog \ fn_dump_dblog) or direct TLOG backup access to read the content of the log and parse it inside Replicate to extract the DML\DDL activity.
BULK LOGGED or FULL Recovery Model is required. Full backup to be executed after changing the recovery model
Change Data Capture, also known as CDC, introduced the first time in SQL Server 2008 version, as a helpful feature to track and capture the changes that are performed on the SQL Server database tables, with no additional programming efforts. Before SQL Server 2016, Change Data Capture could be enabled on a SQL Server database only under the SQL Server Enterprise edition, which is not required starting from SQL Server 2016.
Change Data Capture tracks the INSERT, UPDATE and DELETE operations on the database table, and records detailed information about these changes in a mirrored table, with the same columns structure of the source tables, and additional columns to record the description of these changes. SQL Server writes one record for each INSERT statement showing the inserted values, one record for each DELETE statement showing the deleted data and two records for each UPDATE statement, the first one showing the data BEFORE the change and the second one showing the data AFTER performing the change.
Change Data Capture requires that a SQL Server Agent be running on a SQL Server instance. When the feature is enabled on a SQL Server database table, two SQL Server Agent jobs will be created for that database; the first job is responsible for populating database change tables with the changes information and the second job is responsible for the cleaning up the change tables by deleting the records older than the configurable retention period of several days.
In this context Qlik replicate will access the Change Tables on the source to get all the changes since the last iteration.
To find out more about Active Guard MS-CDC specifics, please review the dedicated Miscrosoft Documentation page
There are some limitations that come together with this method. Those include:
These limitations of MS-CDC are documented both by Microsoft and Qlik
In the case that MS-Replication and MS-CDC are both enabled on the server there will be no Capture job running on the source database. The job of capturing the changes and writing to the change tables will be done by the replication log-reader job. Qlik Replicate will also need special configuration to handle that type of setup. E.g. adding the internal parameter “skipMscdcJobFitnessCheck” set to “True” in the source endpoint advanced settings. Also note that you cannot use the truncation prevention method ‘Exclusively use sp_repldone within a single task’ as this stops the log-reader job
After having understood the 2 approaches, the chapter below identifies scenarios where MS-CDC based replication will be preferred over the MS-REPLICATION
Article - an article is the basic unit of SQL Server Replication. An article can consist of tables, stored procedures, and views. It is possible to scale the article, horizontally and vertically using a filter option. We can also create multiple articles on the same object with some restrictions and limitations.
Publication - a publication is a logical collection of articles from a database. The entity allows us to define and configure article properties at the higher level so that the properties are inherited to all the articles in that group.
Publisher database - The publisher is a database that contains a list of objects that are designated as SQL Server replication articles are known as publication database. The publisher can have one or more publications. Each publisher defines a data propagation mechanism by creating several internal replication stored procedures.
Publisher - The Publisher is a database instance that makes data available to other locations through SQL Server replication. The Publisher can have one or more publications, each defining a logically related set of objects and data to replicate.
Distributor - The Distributor is a database that acts as a storehouse for replication specific data associated with one or more Publishers. In many cases, the distributor is a single database that acts as both the Publisher and the Distributor. In the context of SQL Server replication, this is commonly known as a “local distributor”. On the other hand, if it’s configured on a separate server, then it is known as a “remote distributor”. Each Publisher is associated with a single database known as a “distribution database” aka the “Distributor”.
The distribution database identifies and stores SQL Server replication status data, metadata about the publication, and, in some cases, acts as a queue for data moving from the Publisher to the Subscribers.
Distribution databases - Each Distributor must have at least one distribution database. The distribution database consists of article detail, replication meta-data and data. A Distributor can hold more than one distribution database; however, all publications defined on a single Publisher must use the same distribution database.
Replication agents - SQL Server replication uses a pre-defined set of standalone programs and events are known as agents, to carry out the tasks associated with data. By default, SQL Server replication agents run as scheduled jobs under SQL Server Agent. Replication agents can also be run from the command line and by applications that use Replication Management Objects (RMO). SQL Server replication agents can be monitored and administered using Replication Monitor and SQL Server Management Studio.
Replication snapshot Agent - the Replication snapshot Agent is used with all types of SQL Server replication technology as it provides the required data set to perform the initial data synchronization of the publication database with the subscription database. It prepares schema and initial data of published articles, snapshot files, and records information about the synchronization type in the distribution database.
Log Reader Agent - the Log Reader Agent is used only with transactional replication. It moves replication transactions from the online transaction log of the publication database to the distribution database.
Distribution Agent - the Distribution Agent is used only with Replication snapshot and Transactional SQL Server replication. This agent applies the initial replication snapshot to the subscription database and later, the data changes are tracked and recorded in the distribution database and applied to the subscription database.
Merge Agent - the Merge Agent is used with the merge replication model. By default, the Merge Agent uploads changes from the Subscriber to the Publisher and then downloads changes from the Publisher to the Subscriber. Each subscription has its own Merge Agent that connects to both the Publisher and the Subscriber and updates both. The Merge Agent runs at either the Distributor for push subscriptions or the Subscriber for pull subscriptions. Here, the synchronization is bi-directional. The data conflicts are handled by a set of triggers that supports the entire process
fn_dblog() - is one of several undocumented functions for SQL Server; it allows you to view the transaction log records in the active part of the transaction log file for the current database
fn_dump_dblog() - is another undocumented function used to analyze the transaction log backup file of the database
Replication, change tracking, & change data capture - Always On availability groups
SQL Server replication: Overview of components and topography