Re: Potential data loss due to race condition during logical replication slot creation

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-07-07 23:56:42
Message-ID: ZosrOu7RFnrrsuHL@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 05, 2024 at 11:52:52PM +0900, Masahiko Sawada wrote:
> I've attached updated patches for HEAD and pg17 for now (I will create
> the patch for other backbranches).
>
> In the patches, I used a different approach in between HEAD and
> backbranches. On HEAD, we store a flag indicating whether or not we
> should skip snapshot restores into the SnapBuild struct and set it
> only while finding the start point. Therefore we have to bump
> SNAPBUILD_VERSION. On backbranches, I used the approach we discussed
> above; store the flag in LogicalDecodingContext and set it when
> creating the LogicalDecodingContext for a new logical slot. A possible
> downside of the approach taken for backbranches is that we implicitly
> require for users to use the same LogicalDecodingContext for both
> initializing the context for a new slot and finding its start point.
> IIUC it was not strictly required. This restriction would not be a
> problem at least in the core, but I'm not sure if there are no
> external extensions that create a logical slot in that way. This is
> the main reason why I used a different approach on HEAD and
> backbranches. Therefore, if it does not matter, I think we can use the
> same approach on all branches, which is better in terms of
> maintainability.

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -189,6 +189,9 @@ struct SnapBuild
/* Indicates if we are building full snapshot or just catalog one. */
bool building_full_snapshot;

+ /* Indicates if we are finding the start point to extract changes */
+ bool finding_start_point;
+

FYI, I think that it is still OK to bump SNAPBUILD_VERSION on
REL_17_STABLE. That will reduce by 1 year the time window required to
maintain the tweaks implemented for the versions in the back-branches.
So I'd suggest to do what the v17 version of the patch does for ~16,
and use the snapshot format changes in 17~.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2024-07-08 02:33:39 Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join
Previous Message Alexander Lakhin 2024-07-07 11:00:00 Re: BUG #18509: Logical decoding behaves badly when processing a change record for a table with altered column