Quick Links

Re: Initial Schema Sync for Logical Replication

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Kumar, Sachin" <ssetiya(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Initial Schema Sync for Logical Replication
Date:	2023-04-20 11:16:38
Message-ID:	CAA4eK1LK0JcpuG=Hq5rU0uP5MqAxd59oYbt=igPHn6sadis6_A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > >
> > > While writing a PoC patch, I found some difficulties in this idea.
> > > First, I tried to add schemaname+relname to pg_subscription_rel but I
> > > could not define the primary key of pg_subscription_rel. The primary
> > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> > > also doesn't work.
> > >
> >
> > Can we think of having a separate catalog table say
> > pg_subscription_remote_rel for this? You can have srsubid,
> > remote_schema_name, remote_rel_name, etc. We may need some other state
> > to be maintained during the initial schema sync where this table can
> > be used. Basically, this can be used to maintain the state till the
> > initial schema sync is complete because we can create a relation entry
> > in pg_subscritption_rel only after the initial schema sync is
> > complete.
>
> It might not be ideal but I guess it works. But I think we need to
> modify the name of replication slot for initial sync as it currently
> includes OID of the table:
>
> void
> ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
> char *syncslotname, Size szslot)
> {
> snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
> relid, GetSystemIdentifier());
> }
>
> If we use both schema name and table name, it's possible that slot
> names are duplicated if schema and/or table names are long. Another
> idea is to use the hash value of schema+table names, but it cannot
> completely eliminate that possibility, and probably would make
> investigation and debugging hard in case of any failure. Probably we
> can use the OID of each entry in pg_subscription_remote_rel instead,
> but I'm not sure it's a good idea, mainly because we will end up using
> twice as many OIDs as before.
>

The other possibility is to use worker_pid. To make debugging easier,
we may want to LOG schema_name+rel_name vs slot_name mapping at DEBUG1
log level.

--
With Regards,
Amit Kapila.

In response to

Re: Initial Schema Sync for Logical Replication at 2023-04-17 03:41:29 from Masahiko Sawada

Responses

Re: Initial Schema Sync for Logical Replication at 2023-04-21 08:47:31 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aleksander Alekseev	2023-04-20 11:22:07	Re: [PATCH] Allow Postgres to pick an unused port to listen
Previous Message	Amit Kapila	2023-04-20 10:10:44	Re: Support logical replication of DDLs