Re: Replication slot for specific tables - Qlik Community

gayatri235 · ‎2024-02-26

Hello Team,

I am facing latency and replication slot growth issue for our source Postgresql.

We have this table , highly transacting. We removed this table from our replication task, still I see this table in our logs.

Can we create replication slot for specific tables only ?

I came across this pg_replication_sets which is available after postgresql 14 and later.

Can this be used ?

DesmondWOO · ‎2024-02-27

Hi @gayatri235 ,

Thank you for reaching out to the Qlik Community.

From my understanding, the WAL records all changes made to data file. I am afraid the answer is "no".

I am not sure what "pg_replication_sets" is. Could you share your idea in detail?

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

john_wang · ‎2024-02-27

Hello @gayatri235 ,

Besides @DesmondWOO comments, I think you are meaning replication_set. In general spreading different tables to multiple slots does not help in reducing the WAL size. However it may introduce multiple slots/tasks to read the changing data from WAL in parallel (rather than in serial) from PostgreSQL, if the hardware/configuration and the IO/network are not bottleneck in PostgreSQL source sides. Maybe 2 slots help however we need tests in your environment, and in my experience please do not use too many slots eg 3+. Any feedback are welcome.

Thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

gayatri235 · ‎2024-02-27

@DesmondWOO , I found on this chat GPT.

-- Step 1: Create a logical replication set
SELECT pg_create_logical_replication_set('my_replication_set', '{"table1", "table2"}');

-- Step 2: Create a replication publication
CREATE PUBLICATION my_publication FOR TABLE my_replication_set;

-- Step 3: Create a logical replication slot associated with the replication publication
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'pgoutput');

-- Optionally, you can specify the publication to use for the slot:
-- SELECT * FROM pg_create_logical_replication_slot('my_slot', 'pgoutput', 'my_publication');

Let me know , if this works ?

gayatri235 · ‎2024-02-27

@john_wang , Can you tell me how I can see bottleneck at source postgresql ?

Also, pg_replication_sets I am looking for an option which will help me optimize my replication process.

We are not replicating high transacting table but still replicate is reading that wal, I know we cant even skip that. But can you suggest me how I should tune my replication job .

john_wang · ‎2024-02-27

Hello @gayatri235 ,

I'm afraid the answer is No. These SQL do not work.

If you want to know how Replicate creates slot and how to use it, you may set SOURCE_CAPTURE to Verbose, run the task again then check the task log file.

Please take note that the slot will be created in the task first time run. If you want to get the slot creation SQL, please delete the slot manually then run the task again.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

gayatri235 · ‎2024-02-27

Thats ok, I am aware about the replicate slot creation process.

Kindly help me with task settings that I can do to faster the wal/slot processing at replicate end ?

john_wang · ‎2024-02-27

Hello @gayatri235 ,

When it comes to the bottleneck, I think we may narrow down it by:

1- Use the NULL target to run the task, if any latency

2- If NO in step (1) then probably target side, or the network between Replicate server and target side database server is the bottleneck.

3- If you are facing latency in step (1) then we need to tune the source PostgreSQL or the network between Replicate server and PostgreSQL.

If want to trace the whole path performance (with real source and target endpoints), please keep all other logging component to default (Information level) and keep PERFORMANCE to Trace, run the task again (keep it run at least 15 mins) then check the task log file to understand further. A typical output like:

2024-02-27T11:01:48 [PERFORMANCE ]T: Source latency 281.39 seconds, Target latency 2.03 seconds, Handling latency 2.64 seconds (replicationtask.c:3734)

You can see where is the bottleneck, source, target, or handling.

Also please check Latency / Performance Troubleshooting and Tuning for Replicate.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

gayatri235 · ‎2024-02-27

So we tried replicating to NULL today, we didnt see latency issue today.

But with Synapse there is latency when replication slot size grow.

We checked at target end, there were no queries in queued state or taking more than seconds for execution everything seems to be normal at target.

john_wang · ‎2024-03-03

Hello @gayatri235 ,

Looks to me the bottleneck is at network (between Replicate Server and Synapse) and/or Synapse side (eg if it swallows the data fast enough). And also if the compression is enabled in Replicate endpoint settings etc, anyway we need tuning according the hardware and software configurations. I suggest PS team help you.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Replication slot for specific tables

Best Practices

CDC - 1-many - Log Stream

Configuration

Connectivity - Sources or Targets

Performance