Re: Potential data loss due to race condition during logical replication slot creation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-06-25 04:24:10
Message-ID: CAA4eK1LbpagXYw6eP+qBz2SYjQ3x26ZdVCJBkp033aExqA2MbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jun 24, 2024 at 10:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jun 24, 2024 at 12:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 21, 2024 at 12:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > The approach (a) has a downside, it will lead to tracking more
> > > > transactions (non-catalog) than required without any benefit for the
> > > > user. Considering that is true, I wouldn't prefer that approach.
> > >
> > > Yes, it will lead to tracking non-catalog-change transactions as well.
> > > If there are many subtransactions, the overhead could be noticeable.
> > > But it happens only once when creating a slot.
> > >
> >
> > True, but it doesn't seem advisable to add such an overhead even
> > during create time without any concrete reason.
> >
> > > Another variant of (a) is that we skip snapshot restores if the
> > > initial_xmin_hirizon is a valid transaction id. The
> > > initia_xmin_horizon is always set to a valida transaction id when
> > > initializing the decoding context, e.g. during
> > > CreateInitDecodingContext(). That way, we don't need to track
> > > non-catalog-change transctions. A downside is that this approach
> > > assumes that DecodingContextFindStartpoint() is called with the
> > > decoding context created by CreateInitDecodingContxt(), which is true
> > > in the core codes, but might not be true in third party extensions.
> > >
> >
> > I think it is better to be explicit in this case rather than relying
> > on initia_xmin_horizon. So, we can store in_create/create_in_progress
> > flag in the Snapbuild in HEAD and store it in LogicalDecodingContext
> > in back branches.
>
> I think we cannot access the flag in LogicalDecodingContext from
> snapbuild.c at least in backbranches. I've discussed adding such a
> flag in snapbuild.c as a global variable, but I'm slightly hesitant to
> add a global variable besides InitialRunningXacts.
>

I agree that adding a global variable is not advisable. Can we pass
the flag stored in LogicalDecodingContext to snapbuild.c? That might
not be elegant but I don't have any better ideas.

> > I think changing SnapBuild means we have to update
> > SNAPBUILD_VERSION, right? Is that a good idea to do at this point of
> > time or shall we wait new branch to open and change it there? Anyway,
> > it would be a few days away and in the meantime, we can review and
> > keep the patches ready.
>
> I think we should wait to add such changes that break on-disk
> compatibility until a new branch opens. On HEAD, I think we can add a
> new flag in SnapBuild and set it during say
> DecodingContextFindStartpoint().
>

Fair enough.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bowen Shi 2024-06-25 07:52:52 PG16 walsender hangs in ResourceArrayEnlarge using pgoutput
Previous Message Antti Lampinen 2024-06-25 04:24:05 Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join