Re: Potential data loss due to race condition during logical replication slot creation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-04-16 03:56:30
Message-ID: CAD21AoDzjFxFiRYE-Qpxe83weewegbWf7f+zK3r4pSFZwxW_rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Mar 18, 2024 at 6:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Feb 19, 2024 at 2:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 8, 2024 at 2:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Feb 3, 2024 at 12:12 PM Callahan, Drew <callaan(at)amazon(dot)com> wrote:
> > > >
> > > > > I think we can prepare an isolation test of the above steps and
> > > > > include it in contrib/test_decoding/specs.
> > > >
> > > > Thanks for the suggestion. Included an isolation test for the
> > > > exact repro and a couple other tests for fully before and fully after.
> > > > These two tests don't add a whole lot of value since this
> > > > is already heavily tested elsewhere, so I'm good with removing.
> > > > However, thought it made sense to include from a perspective of
> > > > totality given the test name.
> > > >
> > > > > Another simple way to fix this is that we always set
> > > > > need_full_snapshot to true when slot creation. But it would then make
> > > > > the snapshot builder include non-catalog-change transactions too while
> > > > > finding the initial startpoint. Which is not necessary.
> > > >
> > > > Since we're not accumulating changes anyway and we never distribute snapshots
> > > > while inconsistent it wouldn't cause any memory pressure in the ReorderBuffer.
> > > > This seems pretty contained and wouldn't require the use of a global variable
> > > > and memory context callbacks which is a bit nicer from a readability standpoint.
> > >
> > > Sorry I missed this comment.
> > >
> > > I think it's a good point that this fix doesn't lead to any on-disk
> > > compatibility while not affecting existing users much, especially for
> > > back-patching. If we want to choose this approach for bank branches,
> > > we need to carefully consider the side impacts.
> > >
> > > > It's a little inefficient with memory and CPU, especially in the presence of
> > > > long running transactions, but likely would not be noticeable to most users.
> > >
> > > This matches my analysis.
> > >
> > > Here is the summary of several proposals we've discussed:
> > >
> > > a) Have CreateInitDecodingContext() always pass need_full_snapshot =
> > > true to AllocateSnapshotBuilder().
> > >
> > > This is the simplest approach and doesn't break the on-disk
> > > SnapBuildOnDisck compatibility. A downside is that it's a little
> > > inefficient with memory and CPU since it includes non-catalog change
> > > transactions too. Also, passing need_full_snapshot to
> > > CreateInitDecodingContext() will no longer have any meaning.
> > >
> > > b) Have snapbuild.c being able to handle multiple SnapBuildOnDisk versions.
> > >
> > > This bumps SNAPBUILD_VERSION and therefore breaks the on-disk
> > > SnapBuildOnDisk compatibility, but snapbuild.c can handle multiple
> > > versions. But since different branches are using different versions of
> > > SnapBuildOnDisk, it would probably be hard to have the SnapBuildOnDisk
> > > version that is consistent across all versions.
> > >
> > > c) Add a global variable, say in_create, to snapbuild.c
> > >
> > > it would also require adding a setter function for in_create. There
> > > are several ideas where to set/reset the flag. One idea is that we
> > > reset the flag in AllocateSnapshotBuilder() and set the flag anywhere
> > > before starting to find the start point, for example at the beginning
> > > of DecodingContextFindStartpoint(). It probably won't require a memory
> > > context callback to make sure to clear the flag. This idea doesn't
> > > have a downside from users and extensions perspects. But I'm slightly
> > > hesitant to add a global variable.
> > >
> > > What do you think? and any other ideas?
> >
> > I've drafted the idea (c) for discussion (for master and v16 for now).
> > I also liked the idea (a) but I'm concerned a bit about future impact.
> >
>
> I agree that (c) would be better than (a) as it avoids collecting
> non-catalog xacts in snapshot. However, I think we shouldn't avoid
> restoring the snapshot unless really required.

Pondering further, I came across the question; in what case we would
need to restore the snapshot and jump to the consistent state when
finding the initial start point?

When creating a slot, in ReplicationSlotReserveWal(), we determine the
restart_lsn and write a RUNNING_XACT record. This record would be the
first RUNNING_XACT record that the logical decoding decodes in most
cases. In SnapBuildFindSnapshot(), if the record satisfies (a) (i.e.
running->oldestRunningXid == running->nextXid), it can jump to the
consistent state. If not, it means there were running transactions at
that time of the RUNNING_XACT record being produced. Therefore, we
must not restore the snapshot and jump to the consistent state.
Because otherwise, we end up deciding the start point in the middle of
a transaction that started before the RUNNING_XACT record. Probably
the same is true for all subsequent snapbuild states.

I might be missing something but I could not find the case where we
can or want to restore the serialized snapshot when finding the
initial start point. If my analysis is correct, we can go with either
(a) or (c) I proposed before[1]. Between these two options, it also
could be an option that (a) is for backbranches for safety and (c) is
for master.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBH4NWdRcEjXpBFHSKuVO6wia-vHHHaKuEi-h4i4wbi8A%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2024-04-16 03:58:25 Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Previous Message David G. Johnston 2024-04-16 03:15:18 Upgrading an extension's extnamespace from user-specified to a defined schema breaks dump/restore