From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Callahan, Drew" <callaan(at)amazon(dot)com> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Potential data loss due to race condition during logical replication slot creation |
Date: | 2024-02-05 02:06:34 |
Message-ID: | CAD21AoCsSkwcAYgcUMi2zKp8F=moJL4K1beGD6AV_=pHhfAM9Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sat, Feb 3, 2024 at 12:12 PM Callahan, Drew <callaan(at)amazon(dot)com> wrote:
>
> > I think we can prepare an isolation test of the above steps and
> > include it in contrib/test_decoding/specs.
>
> Thanks for the suggestion. Included an isolation test for the
> exact repro and a couple other tests for fully before and fully after.
> These two tests don't add a whole lot of value since this
> is already heavily tested elsewhere, so I'm good with removing.
> However, thought it made sense to include from a perspective of
> totality given the test name.
Thank you for the patch for the isolation test! Since Vignesh also
shared the isolation patch I've merged them and modified some
comments. I've attached the patch. I'm going to merge the isolation
test patch to the patch to fix the issue after we get a consensus on
how to fix it.
>
> > Another simple way to fix this is that we always set
> > need_full_snapshot to true when slot creation. But it would then make
> > the snapshot builder include non-catalog-change transactions too while
> > finding the initial startpoint. Which is not necessary.
>
> Since we're not accumulating changes anyway and we never distribute snapshots
> while inconsistent it wouldn't cause any memory pressure in the ReorderBuffer.
> This seems pretty contained and wouldn't require the use of a global variable
> and memory context callbacks which is a bit nicer from a readability standpoint.
> It's a little inefficient with memory and CPU, especially in the presence of
> long running transactions, but likely would not be noticeable to most users.
>
> > As for the proposed patch, it uses a global variable, InCreate, and
> > memory context callback to reset it. While I believe this is for not
> > breaking any ABI as this bug exists on back branches too, I think it's
> > better to consider how to fix it in HEAD and then how to backpatch it.
> > We don't necessarily need to have the same fixes for all branches all
> > the time.
>
> Attached a patch that is ABI breaking, but removes the need for a global variable.
> This relies on pushing the context of whether we are in a create mode to the
> logical decoding context, which is cleaner, but not as clean as just pushing
> all the way down to the snapbuilder. I would push down to the snap builder,
> but the entire object is persisted to disk. This is problematic to me since the
> create flag would be persisted causing exisiting slots to restore a snapbuild
> that could have the create flag set to true. This is not problematic as is,
> since the variable would only be leveraged in SnapBuildRestore() which is only
> called while inconsistent, but seems like a future bug waiting to happen.
But IIUC in SnapBuildRestore(), we selectively restore variables from
serialized SnapBuild. For instance, building_full_snapshot and
initial_xmin_horizon are serialized to the disk but ignored when
restoring a snapshot. I think that even if we add the in_create flag
to SnapBuild, it would be ignored on snapshot restore.
I think that it's cleaner if we could pass in_creat flag to
AllocateSnapshotBuilder() than passing the flag down to
SnapBuildFindSnapshot() from standby_decode().
>
> We could always clear the flag when persisting, but that seems a bit hacky.
> We could also fork the snap builder into on-disk and in-memory structs, which
> would be convenient since we would not need to invalidate exisiting snapshots.
> However, not sure how people feel about the fork.
>
> > IIUC we need to skip snapshot restore only when we're finding the
> > initial start point. So probably we don't need to set InCreate when
> > calling create_logical_replication_slot() with find_startpoint =
> > false.
>
> Yep, that's correct. We can safely skip setting InCreate. We could move
> the setter function slotfuncs.c as one way of doing this. I lean towards
> setting it no matter what since it is harmless if we never try to find
> a decoding point and removes the need for future callers of
> CreateInitDecodingContext() from having to determine if it's safe to be false.
> Though I agree that InCreate may a bigger hammer than needed which may need
> to be modified in the future.
I agree to set the flag no matter what. The flag is needed for finding
the initial startpoint but it would be harmless to set it even if the
caller doesn't call DecodingContextFindStartpoint(). If the caller
skips to call DecodingContextFindStartpoint() it's the caller's
responsibility to set a sane LSNs to the slots before staring the
logical decoding. So it makes sense to me to remove the need for
callers to determine to set the flag, as you mentioned.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
isolation_test_v2.patch | application/octet-stream | 5.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-02-05 02:12:31 | Re: Potential data loss due to race condition during logical replication slot creation |
Previous Message | Tom Lane | 2024-02-05 00:16:40 | Re: BUG #18330: The query planner chooses the wrong plan when using the parallel aggregation function |