Re: speed up a logical replica setup

From: Noah Misch <noah(at)leadboat(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-06-26 19:21:16
Message-ID: 20240626192116.a8.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 25, 2024 at 09:50:59PM -0300, Euler Taveira wrote:
> On Tue, Jun 25, 2024, at 3:24 AM, Amit Kapila wrote:
> > On Tue, Jun 25, 2024 at 3:38 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> > > > On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > > > +static void
> > > > > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > > > > +{
> > > > >
> > > > > > + appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > > > > + ipubname_esc);
> > > > >
> > > > > This tool's documentation says it "guarantees that no transaction will be
> > > > > lost." I tried to determine whether achieving that will require something
> > > > > like the fix from
> > > > > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6(at)enterprisedb(dot)com(dot)
> > > > > (Not exactly the fix from that thread, since that thread has not discussed the
> > > > > FOR ALL TABLES version of its race condition.) I don't know. On the one
> > > > > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > > > > the publication. That snapbuild.c process will wait for running XIDs. On the
> > > > > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > > > > its relcache entry before assigning an XID, so perhaps the snapbuild.c process
> > >
> > > Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
> > > relcache entry, just how pgoutput of the change builds the relcache entry from
> > > the historic snapshot.
> > >
> > > > > isn't enough to prevent that thread's race condition. What do you think?
> > > >
> > > > I am not able to imagine how the race condition discussed in the
> > > > thread you quoted can impact this patch. The problem discussed is
> > > > mainly the interaction when we are processing the changes in logical
> > > > decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> > > > problem happens because we use the old cache state.
> > >
> > > Right. Taking the example from
> > > http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
> > > between what that mail calls 4) and 5) are not safely usable as start points.
> > > pg_createsubscriber evades that thread's problem if the consistent_lsn it
> > > passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
> > > span. I cautiously bet the snapbuild.c process achieves that:
> > >
> > > > I am missing your
> > > > point about the race condition mentioned in the thread you quoted with
> > > > snapbuild.c. Can you please elaborate a bit more?
> > >
> > > When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
> > > part starts at:
> > >
> > > /*
> > > * If caller needs us to determine the decoding start point, do so now.
> > > * This might take a while.
> > > */
> > > if (find_startpoint)
> > > DecodingContextFindStartpoint(ctx);
> > >
> > > Two factors protect pg_createsubscriber. First, (a) CREATE PUBLICATION
> > > committed before pg_create_logical_replication_slot() started. Second, (b)
> > > DecodingContextFindStartpoint() waits for running XIDs to complete, via the
> > > process described at the snapbuild.c "starting up in several stages" diagram.
> > > Hence, the consistent_lsn is not in a bad-start-point LSN span. It's fine
> > > even if the original INSERT populated all caches before CREATE PUBLICATION
> > > started and managed to assign an XID only after consistent_lsn. From the
> > > pgoutput perspective, that's indistinguishable from the transaction starting
> > > at its first WAL record, after consistent_lsn. The linked "long-standing data
> > > loss bug in initial sync of logical replication" thread doesn't have (a),
> > > hence its bug. How close is that to accurate?
> >
> > Yeah, this theory sounds right to me. The key point is that no DML
> > (processing of WAL corresponding to DML) before CREATE PUBLICATION ...
> > command would have reached pgoutput level because we would have waited
> > for it during snapbuild.c. Can we conclude that the race condition
> > discussed in the other thread won't impact this patch?
>
> As Noah said the key point is the CREATE PUBLICATION *before* creating the
> replication slots -- that wait transactions to complete.

Let's consider the transaction loss topic concluded. Thanks for contemplating
it. I've added an open item for the "dbname containing a space" topic.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-06-26 19:23:29 Re: Proposal: Document ABI Compatibility
Previous Message David E. Wheeler 2024-06-26 19:20:52 Re: Proposal: Document ABI Compatibility