Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-22 02:58:54
Message-ID: CAD21AoD5aNCrQUN7j4tMhyW4aOyydQeR+7SfvK0kUkfT0YJ0Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> >
> > On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> >
> > > From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > > On 2023-Mar-15, Kumar, Sachin wrote:
> > >
> > > > 1. In CreateSubscription() when we create replication
> > > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > > can use this snapshot later in the pg_dump.
> > > >
> > > > 2. Now we can call pg_dump with above snapshot from CreateSubscription.
> > >
> > > Overall I'm not on board with the idea that logical replication would depend on
> > > pg_dump; that seems like it could run into all sorts of trouble (what if calling
> > > external binaries requires additional security setup? what about pg_hba
> > > connection requirements? what about max_connections in tight
> > > circumstances?).
> > > what if calling external binaries requires additional security setup
> > I am not sure what kind of security restriction would apply in this case, maybe pg_dump
> > binary can be changed ?
> >
> > Using pg_dump as part of this implementation is not acceptable because we
> > expect the backend to be decoupled from the client. Besides that, pg_dump
> > provides all table dependencies (such as tablespaces, privileges, security
> > labels, comments); not all dependencies shouldn't be replicated.
> >
>
> I agree that in the initial version we may not support sync of all
> objects but why that shouldn't be possible in the later versions?
>
> > You should
> > exclude them removing these objects from the TOC before running pg_restore or
> > adding a few pg_dump options to exclude these objects. Another issue is related
> > to different version. Let's say the publisher has a version ahead of the
> > subscriber version, a new table syntax can easily break your logical
> > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > synchronization.
> >
> > Instead, the backend should provide infrastructure to obtain the required DDL
> > commands for the specific (set of) tables. This can work around the issues from
> > the previous paragraph:
> >
> ...
> > * don't need to worry about different versions.
> >
>
> AFAICU some of the reasons why pg_dump is not allowed to dump from the
> newer version are as follows: (a) there could be more columns in the
> newer version of the system catalog and then Select * type of stuff
> won't work because the client won't have knowledge of additional
> columns. (b) the newer version could have new features (represented by
> say new columns in existing catalogs or new catalogs) that the older
> version of pg_dump has no knowledge of and will fail to get that data
> and hence an inconsistent dump. The subscriber will easily be not in
> sync due to that.
>
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems.

Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.

> If so, we may need to deny schema sync in any such case.

Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2023-03-22 03:07:28 Re: Commitfest 2023-03 starting tomorrow!
Previous Message Greg Stark 2023-03-22 02:43:05 misplaced GUC in pqcomm.h -- where to put actual common variable though...?