Re: Potential data loss due to race condition during logical replication slot creation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-07-08 03:01:23
Message-ID: CAD21AoBkbFZ8Fr51DY9fBWtpP-+3RGVrMtH_O96V8vgSs5eqkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 8, 2024 at 8:56 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Jul 05, 2024 at 11:52:52PM +0900, Masahiko Sawada wrote:
> > I've attached updated patches for HEAD and pg17 for now (I will create
> > the patch for other backbranches).
> >
> > In the patches, I used a different approach in between HEAD and
> > backbranches. On HEAD, we store a flag indicating whether or not we
> > should skip snapshot restores into the SnapBuild struct and set it
> > only while finding the start point. Therefore we have to bump
> > SNAPBUILD_VERSION. On backbranches, I used the approach we discussed
> > above; store the flag in LogicalDecodingContext and set it when
> > creating the LogicalDecodingContext for a new logical slot. A possible
> > downside of the approach taken for backbranches is that we implicitly
> > require for users to use the same LogicalDecodingContext for both
> > initializing the context for a new slot and finding its start point.
> > IIUC it was not strictly required. This restriction would not be a
> > problem at least in the core, but I'm not sure if there are no
> > external extensions that create a logical slot in that way. This is
> > the main reason why I used a different approach on HEAD and
> > backbranches. Therefore, if it does not matter, I think we can use the
> > same approach on all branches, which is better in terms of
> > maintainability.
>
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -189,6 +189,9 @@ struct SnapBuild
> /* Indicates if we are building full snapshot or just catalog one. */
> bool building_full_snapshot;
>
> + /* Indicates if we are finding the start point to extract changes */
> + bool finding_start_point;
> +
>
> FYI, I think that it is still OK to bump SNAPBUILD_VERSION on
> REL_17_STABLE. That will reduce by 1 year the time window required to
> maintain the tweaks implemented for the versions in the back-branches.
> So I'd suggest to do what the v17 version of the patch does for ~16,
> and use the snapshot format changes in 17~.

Thank you for the comment.

Agreed. If we get consensus on this approach to go, I'll use the patch
changing the SnapBuild struct for 17 and HEAD.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Harald.Munsch 2024-07-08 06:01:48 AW: Error when installing PostgreSQL 16.3.2 under system context NT AUTHORITY\SYSTEM
Previous Message Richard Guo 2024-07-08 02:33:39 Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join