Re: Potential data loss due to race condition during logical replication slot creation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-06-24 03:54:46
Message-ID: CAA4eK1JbN=jTNtejjk3Jb-4Gr6mZ7PXRn6A_Boda0afB0js1oQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jun 21, 2024 at 12:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > The approach (a) has a downside, it will lead to tracking more
> > transactions (non-catalog) than required without any benefit for the
> > user. Considering that is true, I wouldn't prefer that approach.
>
> Yes, it will lead to tracking non-catalog-change transactions as well.
> If there are many subtransactions, the overhead could be noticeable.
> But it happens only once when creating a slot.
>

True, but it doesn't seem advisable to add such an overhead even
during create time without any concrete reason.

> Another variant of (a) is that we skip snapshot restores if the
> initial_xmin_hirizon is a valid transaction id. The
> initia_xmin_horizon is always set to a valida transaction id when
> initializing the decoding context, e.g. during
> CreateInitDecodingContext(). That way, we don't need to track
> non-catalog-change transctions. A downside is that this approach
> assumes that DecodingContextFindStartpoint() is called with the
> decoding context created by CreateInitDecodingContxt(), which is true
> in the core codes, but might not be true in third party extensions.
>

I think it is better to be explicit in this case rather than relying
on initia_xmin_horizon. So, we can store in_create/create_in_progress
flag in the Snapbuild in HEAD and store it in LogicalDecodingContext
in back branches. I think changing SnapBuild means we have to update
SNAPBUILD_VERSION, right? Is that a good idea to do at this point of
time or shall we wait new branch to open and change it there? Anyway,
it would be a few days away and in the meantime, we can review and
keep the patches ready.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2024-06-24 05:01:49 Re: Potential data loss due to race condition during logical replication slot creation
Previous Message Eric Marsden 2024-06-21 16:16:53 Re: Cache lookup failed for type 34813 (CREATE TYPE AS ENUM + P/B/E insert, processor-specific) / user error