Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-27 02:47:01
Message-ID: CAD21AoAzibaj_vyogYk5c9wCkP5T1n8s-iqPMbrTdxQ3JZ54Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
> > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > I think we won't be able to use same snapshot because the transaction will
> > > be committed.
> > > In CreateSubscription() we can use the transaction snapshot from
> > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > about this part maybe walrcv_disconnect() calls the commits internally ?).
> > > So somehow we need to keep this snapshot alive, even after transaction
> > > is committed(or delay committing the transaction , but we can have
> > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > before tableSync is able to use the same snapshot.)
> > >
> >
> > Can we think of getting the table data as well along with schema via
> > pg_dump? Won't then both schema and initial data will correspond to the
> > same snapshot?
>
> Right , that will work, Thanks!

While it works, we cannot get the initial data in parallel, no?

>
> > > I think we can have same issues as you mentioned New table t1 is added
> > > to the publication , User does a refresh publication.
> > > pg_dump / pg_restore restores the table definition. But before
> > > tableSync can start, steps from 2 to 5 happen on the publisher.
> > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > And table sync errors out
> > > There can be one more issue , since we took the pg_dump without
> > snapshot (wrt to replication slot).
> > >
> >
> > To avoid both the problems mentioned for Refresh Publication, we can do
> > one of the following: (a) create a new slot along with a snapshot for this
> > operation and drop it afterward; or (b) using the existing slot, establish a
> > new snapshot using a technique proposed in email [1].
> >
>
> Thanks, I think option (b) will be perfect, since we don’t have to create a new slot.

Regarding (b), does it mean that apply worker stops streaming,
requests to create a snapshot, and then resumes the streaming?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-03-27 02:57:59 Re: About the constant-TRUE clause in reconsider_outer_join_clauses
Previous Message Peter Geoghegan 2023-03-27 02:34:49 Re: Amcheck verification of GiST and GIN