Re: Logical replication fails when adding multiple replicas

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: will(dot)roper(at)democracyclub(dot)org(dot)uk
Cc: houzj(dot)fnst(at)fujitsu(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Logical replication fails when adding multiple replicas
Date: 2023-03-23 08:17:42
Message-ID: 20230323.171742.1357157542021128059.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At Wed, 22 Mar 2023 09:25:37 +0000, Will Roper <will(dot)roper(at)democracyclub(dot)org(dot)uk> wrote in
> Thanks for the response Hou,
>
> I've had a look and when the tablesync workers are spinning up there are
> some errors of the form:
>
> "2023-03-17 18:37:06.900 UTC [4071] LOG: logical replication table
> synchronization worker for subscription
> ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
> started"
> "2023-03-17 18:37:06.976 UTC [4071] ERROR: could not create replication
> slot ""pg_37986_sync_37922_7210774007126708177"": ERROR: replication slot
> ""pg_37986_sync_37922_7210774007126708177"" already exists"

The slot name format is "pg_<suboid>_sync_<relid>_<systemid>". It's no
surprise this happens if the subscribers come from the same
backup.

If that's true, the simplest workaround would be to recreate the
subscription multiple times, using a different number of repetitions
for each subscriber so that the subscribers have subscriptions with
different OIDs.

I believe it's not prohitibed for subscribers to have the same system
identifer, but the slot name generation logic for tablesync doesn't
account for cases like this. We might need some server-wide value
that's unique among subscribers and stable while table sync is
running. I can't think of a better place than pg_subscription but I
don't like it because it's not really necessary most of the the
subscription's life.

Do you think using the postmaster's startup time would work for this
purpose? I'm assuming that the slot name doesn't need to persist
across server restarts, but I'm not sure that's really true.

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 07eea504ba..a5b4f7cf7c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1214,7 +1214,7 @@ ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
char *syncslotname, Size szslot)
{
snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
- relid, GetSystemIdentifier());
+ relid, PgStartTime);
}

/*

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dominique Devienne 2023-03-23 11:12:53 Convert pg_constraint.conkey array to same-order array of column names
Previous Message Ajin Cherian 2023-03-23 03:52:42 Re: Support logical replication of DDLs