topic Re: Considerations of having multiple Replicate servers reading the same source in Qlik Replicate

Considerations of having multiple Replicate servers reading the same source

MoeE — Tue, 27 Feb 2024 05:48:19 GMT

Hi team,

I need some guidance. What should be considered when there is 1 Replicate server with almost 100 tasks reading from the same source endpoint. Then another server will be added to read from the same source endpoint.

My current thoughts:

* Estimated number of tasks on the new server

* Estimated number and size of tables on new server

* Any LOBs in the tables

* Is there currently any source latency that indicates the source server is already under too much stress

What considerations am I missing? Thanks.

Regards,

Mohammed

Re: Considerations of having multiple Replicate servers reading the same source

aarun_arasu — Tue, 27 Feb 2024 05:55:57 GMT

Hello @MoeE ,

Thanks for reaching out to Qlik community.

I would recommend you to consider "logstream task" if you have multiple tasks reading from same source.

Please refer to the below user guide

https://help.qlik.com/en-US/replicate/November2023/Content/Replicate/Main/Log%20Stream%20Staging/intro.htm

Regards

Arun

Re: Considerations of having multiple Replicate servers reading the same source

aarun_arasu — Tue, 27 Feb 2024 07:54:42 GMT

Hello Team,

If our response has been helpful, please consider clicking "Accept as Solution". This will assist other users in easily finding the answer.

Regards,
Arun

Re: Considerations of having multiple Replicate servers reading the same source

Heinvandenheuvel — Tue, 27 Feb 2024 13:51:37 GMT

Let's take a step back... WHY do you think you should have 100 tasks reading from the same source endpoint?

IMHO The only valid reason is many different target endpoints.

Are you using Logstream already? You should.

With logstream in place, what is the indication that you might need a second Replicate server?

There is nothing wrong in using a second server, but you should have a solid reason.

Examples of bad reasoning:

- Our tasks are not allowed to have more that NN tables each.

- Our tasks are not allowed to mix source schemas because those represent different customers which must be kept separate. Yeah - no!

Re: Considerations of having multiple Replicate servers reading the same source

MoeE — Tue, 27 Feb 2024 22:28:52 GMT

Hi Hein,

Thanks for the answer. Yes the reason is to reduce weight/stress on the current Replicate server. There is a new project which I believe will be large so that's why the need of more servers has appeared. Also the plan is to a new target (Snowflake) to the new server. Yep, also logstream is a no-brainer, I'll ensure that this is configured efficiently.

Regards,

Mohammed

Re: Considerations of having multiple Replicate servers reading the same source

Heinvandenheuvel — Wed, 28 Feb 2024 13:30:54 GMT

Thanks for the clarification Mohammed.

But what I do NOT see is WHY you have hundreds of (CDC?) task planned for a single source endpoint.

This question is in the context of CDC tasks right? For full-load multiple tasks may well server scheduling purposes.

Some folks come up with a silly arbitrarily rule that no CDC task shall have more then 100 tables and wiith 5000 tables think they need 50 tasks. Nonsense! They need 1, maybe 5 tasks, but no more and will reduce the Replicate server overhead by a factor of 10 just by NOT having to re-read the change log (logstream or direct) over and over.

What I also do NOT see is an indication of having multiple target endpoints for which multiple tasks would be unavoidable. I only hear about maybe 1 more Snowflake target, not dozens. So why not just 1 tasks for everything you had and 1 more for snowflake?

Push your customer for good, concise answer before going for a knee-jerk just add a server 'solution' (workaround!)

Re: Considerations of having multiple Replicate servers reading the same source

DesmondWOO — Thu, 29 Feb 2024 02:30:32 GMT

Hi @MoeE ,

The execution of 100 CDC tasks will establish 100 connections for reading the transaction log. If these tasks are all directed towards the same database server, it could result in significant server load.

Regards,
Desmond

Re: Considerations of having multiple Replicate servers reading the same source

MoeE — Thu, 29 Feb 2024 22:27:17 GMT

Hi @DesmondWOO,

Yep this makes sense. Please help my understanding. Is the main concern with too many connections from Replicate? or is it with it with too many tasks reading from one transaction log right?

It's both right? Too many connections and too many reads to the same database's transaction logs are both causes of overhead?

Regards,

Mohammed

Re: Considerations of having multiple Replicate servers reading the same source

Heinvandenheuvel — Fri, 01 Mar 2024 00:26:15 GMT

@MoeE >>> yes : "It's both right? Too many connections and too many reads to the same database's transaction logs are both causes of overhead?"

Too many connection all reading and interpreting the same transaction log will cause too much overhead on the source server to deliver the data and too much overhead on the Replicate server to interpret that data over and over.

Re: Considerations of having multiple Replicate servers reading the same source

MoeE — Fri, 01 Mar 2024 00:32:12 GMT

Hi Hein,

Thanks. In the theoretical scenario where a server has 10 different databases and there are 10 different logstream staging tasks each reading one of the databases. There are 10 connections to the server, but 1 connection to each database so my understanding is that not much overhead is created in this scenario.

So mainly the issue is too many connections on the same 1 database. Not too many connections on the server which contains these databases. Thanks for the help, it's truly appreciated.

Regards,

Mohammed

Re: Considerations of having multiple Replicate servers reading the same source

SushilKumar — Thu, 07 Mar 2024 08:06:25 GMT

Hello @MoeE

if you talk more about the source endpoint involved then we would suggest and clear the design confusion here.

Regards,

Sushil Kumar

Re: Considerations of having multiple Replicate servers reading the same source

Heinvandenheuvel — Thu, 07 Mar 2024 15:11:17 GMT

>> In the theoretical scenario where a server has 10 different databases and there are 10 different logstream staging tasks each reading one of the databases.

Correct, for 10 different database you must have 10 different end-points and 1 logstream task to read changes from each.

HOWEVER - this is a completely different scenario from the original discussion: "100 tasks reading from the same source endpoint.". - Which is 99% certain to be the wrong design choice.

>> So mainly the issue is too many connections on the same 1 database. Not too many connections on the server which contains these databases.

INCORRECT. This issue (for the original stated problem) is not the number of connections but rather that each CDC connection must actively read/poll the transaction log which typically has high, constant, resource usage on the source server and network, as well as the Replicate server. One CDC reader per database is unavoidable (so 10 per new scenario due to 10 different databases), but more than one should be minimized typically by using logstream, but sometimes by just handling more and more tables in a single task.

>> Not too many connections

Each client tasks will still have all the normal 'full-load' connection to check the schema/table definitions, to be able to reload tables, and to be able to read lobs. So even logstream the total number of connections to the source will be more or less the same, but most of them will be inactive during regular operations. There will be fewer (1) active CDC streams which is a big win normally.

BTW... perhaps an interesting observation: For Oracle sources with pluggable database (PDB's) the REDO/ARCH logs are actually shared and managed in the CDB. Replicate will end up reading the single and same physical log over and over selecting CDC events for each V$INSTANCE number for each PDB.

Hein.