Re: Potential data loss due to race condition during logical replication slot creation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-06-25 08:26:11
Message-ID: CAD21AoDR3h78U0hxdzWPuPL11nvJCWYMB8h+QoOjd82ZmXjfgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jun 25, 2024 at 1:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jun 24, 2024 at 10:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jun 24, 2024 at 12:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jun 21, 2024 at 12:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > The approach (a) has a downside, it will lead to tracking more
> > > > > transactions (non-catalog) than required without any benefit for the
> > > > > user. Considering that is true, I wouldn't prefer that approach.
> > > >
> > > > Yes, it will lead to tracking non-catalog-change transactions as well.
> > > > If there are many subtransactions, the overhead could be noticeable.
> > > > But it happens only once when creating a slot.
> > > >
> > >
> > > True, but it doesn't seem advisable to add such an overhead even
> > > during create time without any concrete reason.
> > >
> > > > Another variant of (a) is that we skip snapshot restores if the
> > > > initial_xmin_hirizon is a valid transaction id. The
> > > > initia_xmin_horizon is always set to a valida transaction id when
> > > > initializing the decoding context, e.g. during
> > > > CreateInitDecodingContext(). That way, we don't need to track
> > > > non-catalog-change transctions. A downside is that this approach
> > > > assumes that DecodingContextFindStartpoint() is called with the
> > > > decoding context created by CreateInitDecodingContxt(), which is true
> > > > in the core codes, but might not be true in third party extensions.
> > > >
> > >
> > > I think it is better to be explicit in this case rather than relying
> > > on initia_xmin_horizon. So, we can store in_create/create_in_progress
> > > flag in the Snapbuild in HEAD and store it in LogicalDecodingContext
> > > in back branches.
> >
> > I think we cannot access the flag in LogicalDecodingContext from
> > snapbuild.c at least in backbranches. I've discussed adding such a
> > flag in snapbuild.c as a global variable, but I'm slightly hesitant to
> > add a global variable besides InitialRunningXacts.
> >
>
> I agree that adding a global variable is not advisable. Can we pass
> the flag stored in LogicalDecodingContext to snapbuild.c?

Ah, I found a good path: snapbuild->reorder->private_data (storing a
pointer to a LogicalDecodingContext). This assumes private_data always
stores a pointer to a LogicalDecodingContext but I think that's find
at least for backbranches.

I've attached the patch for this idea for PG16.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
0001-Fix-possibility-of-logical-decoding-partial-of-trans.patch application/octet-stream 11.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-06-25 11:47:07 RE: PG16 walsender hangs in ResourceArrayEnlarge using pgoutput
Previous Message Bowen Shi 2024-06-25 07:52:52 PG16 walsender hangs in ResourceArrayEnlarge using pgoutput