Quick Links

[PATCH] Reuse Workers and Replication Slots during Logical Replication

From:	Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	[PATCH] Reuse Workers and Replication Slots during Logical Replication
Date:	2022-07-05 13:50:20
Message-ID:	CAGPVpCTq=rUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi hackers,

I created a patch to reuse tablesync workers and their replication slots
for more tables that are not synced yet. So that overhead of creating and
dropping workers/replication slots can be reduced.

Current version of logical replication has two steps: tablesync and apply.
In tablesync step, apply worker creates a tablesync worker for each table
and those tablesync workers are killed when they're done with their
associated table. (the number of tablesync workers running at the same time
is limited by "max_sync_workers_per_subscription")
Each tablesync worker also creates a replication slot on publisher during
its lifetime and drops the slot before exiting.

The purpose of this patch is getting rid of the overhead of
creating/killing a new worker (and replication slot) for each table.
It aims to reuse tablesync workers and their replication slots so that
tablesync workers can copy multiple tables from publisher to subscriber
during their lifetime.

The benefits of reusing tablesync workers can be significant if tables are
empty or close to empty.
In an empty table case, spawning tablesync workers and handling replication
slots are where the most time is spent since the actual copy phase takes
too little time.

The changes in the behaviour of tablesync workers with this patch as
follows:
1- After tablesync worker is done with syncing the current table, it takes
a lock and fetches tables in init state
2- it looks for a table that is not already being synced by another worker
from the tables with init state
3- If it founds one, updates its state for the new table and loops back to
beginning to start syncing
4- If no table found, it drops the replication slot and exits

With those changes, I did some benchmarking to see if it improves anything.
This results compares this patch with the latest version of master branch.
"max_sync_workers_per_subscription" is set to 2 as default.
Got some results simply averaging timings from 5 consecutive runs for each
branch.

First, tested logical replication with empty tables.
10 tables
----------------
- master: 286.964 ms
- the patch: 116.852 ms

100 tables
----------------
- master: 2785.328 ms
- the patch: 706.817 ms

10K tables
----------------
- master: 39612.349 ms
- the patch: 12526.981 ms

Also tried replication tables with some data
10 tables loaded with 10MB data
----------------
- master: 1517.714 ms
- the patch: 1399.965 ms

100 tables loaded with 10MB data
----------------
- master: 16327.229 ms
- the patch: 11963.696 ms

Then loaded more data
10 tables loaded with 100MB data
----------------
- master: 13910.189 ms
- the patch: 14770.982 ms

100 tables loaded with 100MB data
----------------
- master: 146281.457 ms
- the patch: 156957.512

If tables are mostly empty, the improvement can be significant - up to 3x
faster logical replication.
With some data loaded, it can still be faster to some extent.
When the table size increases more, the advantage of reusing workers
becomes insignificant.

I would appreciate your comments and suggestions.Thanks in advance for
reviewing.

Best,
Melih

Attachment	Content-Type	Size
0001-Reuse-Logical-Replication-Background-worker.patch	application/octet-stream	33.3 KB

Responses

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication at 2022-07-06 03:36:13 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2022-07-05 13:59:42	Re: Fix proposal for comparaison bugs in PostgreSQL::Version
Previous Message	Aleksander Alekseev	2022-07-05 13:41:27	Re: POC: Lock updated tuples in tuple_update() and tuple_delete()