Re: Potential data loss due to race condition during logical replication slot creation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Callahan, Drew" <callaan(at)amazon(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-02-19 09:12:23
Message-ID: CAD21AoBJcRiVBT5_ys13wA9c-8zbbqQJsMiVELX3t+HREwJwbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 8, 2024 at 2:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sat, Feb 3, 2024 at 12:12 PM Callahan, Drew <callaan(at)amazon(dot)com> wrote:
> >
> > > I think we can prepare an isolation test of the above steps and
> > > include it in contrib/test_decoding/specs.
> >
> > Thanks for the suggestion. Included an isolation test for the
> > exact repro and a couple other tests for fully before and fully after.
> > These two tests don't add a whole lot of value since this
> > is already heavily tested elsewhere, so I'm good with removing.
> > However, thought it made sense to include from a perspective of
> > totality given the test name.
> >
> > > Another simple way to fix this is that we always set
> > > need_full_snapshot to true when slot creation. But it would then make
> > > the snapshot builder include non-catalog-change transactions too while
> > > finding the initial startpoint. Which is not necessary.
> >
> > Since we're not accumulating changes anyway and we never distribute snapshots
> > while inconsistent it wouldn't cause any memory pressure in the ReorderBuffer.
> > This seems pretty contained and wouldn't require the use of a global variable
> > and memory context callbacks which is a bit nicer from a readability standpoint.
>
> Sorry I missed this comment.
>
> I think it's a good point that this fix doesn't lead to any on-disk
> compatibility while not affecting existing users much, especially for
> back-patching. If we want to choose this approach for bank branches,
> we need to carefully consider the side impacts.
>
> > It's a little inefficient with memory and CPU, especially in the presence of
> > long running transactions, but likely would not be noticeable to most users.
>
> This matches my analysis.
>
> Here is the summary of several proposals we've discussed:
>
> a) Have CreateInitDecodingContext() always pass need_full_snapshot =
> true to AllocateSnapshotBuilder().
>
> This is the simplest approach and doesn't break the on-disk
> SnapBuildOnDisck compatibility. A downside is that it's a little
> inefficient with memory and CPU since it includes non-catalog change
> transactions too. Also, passing need_full_snapshot to
> CreateInitDecodingContext() will no longer have any meaning.
>
> b) Have snapbuild.c being able to handle multiple SnapBuildOnDisk versions.
>
> This bumps SNAPBUILD_VERSION and therefore breaks the on-disk
> SnapBuildOnDisk compatibility, but snapbuild.c can handle multiple
> versions. But since different branches are using different versions of
> SnapBuildOnDisk, it would probably be hard to have the SnapBuildOnDisk
> version that is consistent across all versions.
>
> c) Add a global variable, say in_create, to snapbuild.c
>
> it would also require adding a setter function for in_create. There
> are several ideas where to set/reset the flag. One idea is that we
> reset the flag in AllocateSnapshotBuilder() and set the flag anywhere
> before starting to find the start point, for example at the beginning
> of DecodingContextFindStartpoint(). It probably won't require a memory
> context callback to make sure to clear the flag. This idea doesn't
> have a downside from users and extensions perspects. But I'm slightly
> hesitant to add a global variable.
>
> What do you think? and any other ideas?

I've drafted the idea (c) for discussion (for master and v16 for now).
I also liked the idea (a) but I'm concerned a bit about future impact.

>
> As a separate topic, I think that this backpatching complexity comes
> from the fact that we're serializing a whole SnapBuild to the disk
> although we don't need to serialize all of its fields. In order to
> make future back-patching easy, it might be worth considering
> decoupling the fields that need to be serialized from SnapBuild
> struct.

If we implement this idea, we won't need to bump SNAPBUILD_VERSION for HEAD.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
master_fix_snapbuild_bug_v3.patch application/octet-stream 5.3 KB
v16_fix_snapbuild_bug_v3.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2024-02-19 09:45:21 Re: BUG #18344: Pruning tables partitioned by bool range fails with invalid strategy
Previous Message Tender Wang 2024-02-19 08:21:11 Re: BUG #18344: Pruning tables partitioned by bool range fails with invalid strategy