Skip to main content
Announcements
UPGRADE ADVISORY for Qlik Replicate 2024.5: Read More
cancel
Showing results for 
Search instead for 
Did you mean: 
MoeyE
Partner - Creator III
Partner - Creator III

SQL Server AlwaysOn Availability Group Publication Recommendations

Hi,

I'm working with a SQL Server AlwaysOn cluster and multiple databases. I noticed some issues with the task after failing over. The cause may be due to the DBID of the databases differing between the nodes. 

MoeyE_3-1688710468835.png

MoeyE_4-1688710479182.png

 

I tested this by running the task for the Boats db in node 1 then failing over to node 2 and running the task for the Cars db. This created two duplicate publications with the names AR_PUBLICATION_00008. From here errors kept popping up.

After multiple tests I concluded that creating all the publications on one node before failing over seems to fix the issues. I couldn't find anything on this in the documentation or anyone that has experienced this before. Any input or extra knowledge on this will be great. Thanks.

Kind regards,

Mohammed

 

 

Labels (1)
1 Solution

Accepted Solutions
MoeyE
Partner - Creator III
Partner - Creator III
Author

Hi Mahendra,

From what I remember about this issue, it turned out to be a Availability Group setup issue. Main things to check I think would be that you are connecting to the SQL Cluster listener and that when a publication is created on the primary, check your secondary instance to make sure that the publication has also been created there. If it isn't there's your problem. However there are other contributing factors to check so here are some extra notes that I created on the matter that should hopefully help you get a better understanding to fix your issue. 

Purpose of Publications

The main purpose of publications is to allow CDC to occur. For example, UPDATE will be ignored without publications. They also allow DDL events like ALTER table. 
 

Duplicate Publications

Upon switching back to the original node after a failover, if the original publication is still there on the original node, a new publication will be created. The publication's number may start out as the DBID but the subsequent publication names appear to be arbitrary. This is normal behaviour from what I know and shouldn’t negatively affect the task. More information here: https://community.qlik.com/t5/Qlik-Replicate/Duplicate-publication-created-on-failover-of-Always-on-...
 

Replicate not creating the Publications

To run this task successfully without a sysadmin user please double check and ensure that all the steps in the link below were followed correctly. Also, please double check that the correct permissions are set for the user in step 3 in the link below.
 
Please also ensure to follow these steps in this next link here if using a non sysadmin user in the SQL endpoint https://community.qlik.com/t5/Official-Support-Articles/Qlik-Replicate-SQL-replication-jobs/ta-p/191...
 

AG Recommended setup

In a failover environment, you can manually create a publication for the primary node, then it will be synced to other nodes. As publication is just part of the database, if the database is being synced, the publication is being synced too. The reason of the problem is the failover environment was not set up properly and the database was not synced. So when a publication is created on the primary node, it should also be synced to the other nodes and when a failover occurs the task shouldn't have any issues. Please ensure that the failover and Replicate setup are correct first. I believe that this is where an issue may lie. To sum this up, make sure that when a publication is created in your primary instance, check the secondary instance and make sure that the publication has also been created there too. I believe that this is where our issue was.

AlwaysOn Listener

Just in case you haven't done so already, please make sure that in the SQL source endpoint, you are connecting to the SQL cluster using an AlwaysOn Listener as recommended in the Replicate documentation. Qlik doesn't recommend connecting to a node directly. 
 
 
MoeyE_3-1697495258162.pngMoeyE_4-1697495274773.png

 

 
 

 

 

 

View solution in original post

8 Replies
PGN
Creator II
Creator II

You're not specific on what errors are happening.  When you have an AOG in MS-SQL, it's best practice to have a separate distributor server. 

Also, the publications that are created are to mark what tables (articles) are being replicated.  When on Node A and a transactional task is created, a publication is created as you listed on Node A.    When you failover another publication is created on Node B and the one on Node A will more than likely have the articles removed.  So when you switch back to Node A, you will have two publications, one with articles and one without.  It's safe to remove the empty publications on either Node.  It's not really that harmful if you accidentally remove the "wrong" publication since Qlik will recreate them as needed.  The other links that Suresh listed will also provide some more insight I'm sure.

MoeyE
Partner - Creator III
Partner - Creator III
Author

Hi,

When you say that it's best practice to have a separate distributor server, I am confused as the documentation says to configure it so that it acts as its own distributor?

I have 2 nodes and setup distribution on each of them. Is this setup incorrect?

Thanks,

Mohammed

Kashana
Contributor II
Contributor II

Mohammed, 

Did you find a solution to this. I also ran into same issue.

Thanks,

Mahendra

MoeyE
Partner - Creator III
Partner - Creator III
Author

Hi Mahendra,

From what I remember about this issue, it turned out to be a Availability Group setup issue. Main things to check I think would be that you are connecting to the SQL Cluster listener and that when a publication is created on the primary, check your secondary instance to make sure that the publication has also been created there. If it isn't there's your problem. However there are other contributing factors to check so here are some extra notes that I created on the matter that should hopefully help you get a better understanding to fix your issue. 

Purpose of Publications

The main purpose of publications is to allow CDC to occur. For example, UPDATE will be ignored without publications. They also allow DDL events like ALTER table. 
 

Duplicate Publications

Upon switching back to the original node after a failover, if the original publication is still there on the original node, a new publication will be created. The publication's number may start out as the DBID but the subsequent publication names appear to be arbitrary. This is normal behaviour from what I know and shouldn’t negatively affect the task. More information here: https://community.qlik.com/t5/Qlik-Replicate/Duplicate-publication-created-on-failover-of-Always-on-...
 

Replicate not creating the Publications

To run this task successfully without a sysadmin user please double check and ensure that all the steps in the link below were followed correctly. Also, please double check that the correct permissions are set for the user in step 3 in the link below.
 
Please also ensure to follow these steps in this next link here if using a non sysadmin user in the SQL endpoint https://community.qlik.com/t5/Official-Support-Articles/Qlik-Replicate-SQL-replication-jobs/ta-p/191...
 

AG Recommended setup

In a failover environment, you can manually create a publication for the primary node, then it will be synced to other nodes. As publication is just part of the database, if the database is being synced, the publication is being synced too. The reason of the problem is the failover environment was not set up properly and the database was not synced. So when a publication is created on the primary node, it should also be synced to the other nodes and when a failover occurs the task shouldn't have any issues. Please ensure that the failover and Replicate setup are correct first. I believe that this is where an issue may lie. To sum this up, make sure that when a publication is created in your primary instance, check the secondary instance and make sure that the publication has also been created there too. I believe that this is where our issue was.

AlwaysOn Listener

Just in case you haven't done so already, please make sure that in the SQL source endpoint, you are connecting to the SQL cluster using an AlwaysOn Listener as recommended in the Replicate documentation. Qlik doesn't recommend connecting to a node directly. 
 
 
MoeyE_3-1697495258162.pngMoeyE_4-1697495274773.png

 

 
 

 

 

 

Kashana
Contributor II
Contributor II

I was able to start the task without any issue after 30 minutes. not sure what was stopping task not to come up for around 30 minutes after the failover of database from secondary to primary node in a cluster. When I started the task with advanced option it came up without any issue. Not sure if this kind of step need to be performed when ever there is a DB failover from one node to another node and CDC stops working.

MoeyE
Partner - Creator III
Partner - Creator III
Author

Hi,

This is what I remember of running tasks with my sql cluster setup. After failing over to the secondary database, the task would stop for a short time then resume by itself. For me it was no longer than 1 minute. I believe this is the short time taken by the primary to actually switch over to the secondary.

Are you saying that after failover the task stops completely and you must wait 30 minutes before manually starting it again? What Advanced option did you use to start it.

Also, if these issues persist please consider opening a support case directly with Qlik. I'll try help the best I can for now. 

Regards,

Mohammed

Kashana
Contributor II
Contributor II

Thank you for all your assistance. I had already opened a case with Qlik.