Inquiry About Using Aurora Read Replica for Replic... - Qlik Community

Ayman1 · ‎2024-05-31

We are currently planning to set up data replication from our Aurora PostgreSQL & Aurora MySQL databases to Snowflake using Qlik Replicate (Attunity).

However, we have a specific requirement and constraint:

Requirement: Replication from a read replica.
Constraint: Replicating directly from the master database is not an option.

We understand that binary logs (binlogs) are necessary for change data capture (CDC) to function correctly. Given our constraints, we would like to know if it is possible to achieve replication using a read replica of some sort (either for PostgreSQL or MySQL) instead of the master database.

Specifically, we would like to understand:

Can Qlik Replicate use a read replica for CDC purposes?
Are there any specific configurations or setups required to enable this functionality using a read replica?
Are there any known limitations or considerations we should be aware of when using a read replica for this purpose?

Thanx

john_wang · ‎2024-06-01

Hello @Ayman1 ,

Thanks for reaching out to Qlik Community!

Qlik Replicate is a log-based Change Data Capture (CDC) product that retrieves changes from database transaction logs (binlog in MySQL, WAL in PostgreSQL). If the necessary conditions are met, it can obtain changes from a secondary replica node in both PostgreSQL and MySQL. However, there are some limitations, and the feasibility depends on the specific configurations of PostgreSQL and/or MySQL.

PostgreSQL: Postgres 16.x does allow read/write on read replicas which enables setting the WAL level to 'logical' on the read replica. Qlik Replicate 2024.5 supports PostgreSQL 16.x as source endpoint. We have a discussion about this topic. The limitation is that, DDL cannot be replicated from PostgreSQL master node to PostgreSQL replica node hence DDL replication is impossible being supported by Qlik Replicate, DML operations are supported.

MySQL: If the slave/replica node enabled "Log_slave_update" then the binlog is available in the replica node, Qlik Replicate could get changes from binlog of the replica node, especially if the master slave replication setup is GTID (Global Transaction ID) then it's much easier.

The above conclusion is based on technical analysis. We strongly recommend engaging Professional Services (PS) to conduct a thorough acceptance test before implementing it in the production system.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

lcer · ‎2024-11-12

Hello, sir; I have a question about MySQL replication. MySQL is used as the source for synchronization, but MySQL itself has an LVS architecture for protection, dual-master + LVS. When the replication source points to the master database, MySQL, if the VIP switch is performed and the source points to the new slave database, the AR(qlik replicate) log will report an error (1236 (Could not find first log file name in binary log index file)). The configuration log_slave_updates gtid_mode are all turned on. The only point is that the naming sequence of the binlog files of the dual-master instances is inconsistent. I would like to ask this question. Does this product have a function similar to recording gtid_set? Because in the native gtid mode, the slave database will send the union of retrieved_gtid_set and executed_gtid_set to the master database for filtering and sending missing transactions to the slave database. If there is such a function, please let me know which version of the update is, or whether there is a subsequent function, or my way of thinking is lacking. Please explain directly. Thank you.

john_wang · ‎2024-11-12

Hello @lcer ,

Thanks for the update.

I'm not entirely sure about your MySQL cluster setup, but if GTID (Global Transaction ID) is enabled and enforce_gtid_consistency is set to ON with the following configurations:

gtid_mode = on

enforce_gtid_consistency = on

then it should be possible to resume a task from the "Pos" position after a failover. Thanks to GTID, each transaction has a unique ID that is retained across both the primary and secondary nodes, even though the binlog file names and "Pos" values may differ between them [that's why you got error 1236 (Could not find first log file name in binary log index file) if simply RESUME the task].

To find the correct "Pos" after a switchover, you can use the SHOW BINLOG EVENTS IN command to identify the appropriate "Pos" by matching a specific GTID. This will allow you to resume the task from that position.

I'd like to suggest PS engaged if you need additional assistance around this.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Inquiry About Using Aurora Read Replica for Replication to Snowflake

CDC - 1-many - Log Stream