Re: Potential data loss due to race condition during logical replication slot creation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-07-08 14:02:44
Message-ID: CAD21AoCbarwpVm77oF1tAv__FGqSDE34bu0NjvK6By4Un8YK9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 8, 2024 at 6:01 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> Thanks for creating the patch!
>
> > I've attached updated patches for HEAD and pg17 for now (I will create
> > the patch for other backbranches).
> >
> > In the patches, I used a different approach in between HEAD and
> > backbranches. On HEAD, we store a flag indicating whether or not we
> > should skip snapshot restores into the SnapBuild struct and set it
> > only while finding the start point. Therefore we have to bump
> > SNAPBUILD_VERSION. On backbranches, I used the approach we discussed
> > above; store the flag in LogicalDecodingContext and set it when
> > creating the LogicalDecodingContext for a new logical slot. A possible
> > downside of the approach taken for backbranches is that we implicitly
> > require for users to use the same LogicalDecodingContext for both
> > initializing the context for a new slot and finding its start point.
> > IIUC it was not strictly required. This restriction would not be a
> > problem at least in the core, but I'm not sure if there are no
> > external extensions that create a logical slot in that way. This is
> > the main reason why I used a different approach on HEAD and
> > backbranches. Therefore, if it does not matter, I think we can use the
> > same approach on all branches, which is better in terms of
> > maintainability.
>
> I want to confirm your point. You meant that you wanted to unify appraoches,
> especially you wanted to store the flag in LogicalDecodingContext, rigth?

Yes. Ideally I'd like to use the same approach in all branches
regardless of how to fix it for better maintainability.

> I
> briefly grepped github repos with "DecodingContextFindStartpoint", and cannot
> find such pattern. E.g., [1]. But this point must be pointed by extension developers.

Thank you for searching on github. It will also affect the future
extension developments so I'm slightly hesitant to do that.

> >
> > Also, I excluded the test case for the problem that Kuroda-san
> > reported[1] since the reported problem occurred due to the same cause
> > of the problem originally reported on this thread. The bug that
> > decodes only partial transactions could lead to various symptoms. IIUC
> > these test cases test the same behavior.
>
> Yes, I also think they are caused by the same root cause, so we can skip.
>
> Comments for HEAD patch:
> 1.
> You must modify test_decoding/meson.build. It was also missing in patches for
> backbranches.
>
> 2.
> The test code needs cleanup. E.g.,
> - Steps "s0_insert_cat" and "s0_savepoint" is not used
> - Table user_cat is not used
> It was also missing in patches for backbranches.

Will fix the above points in the next version patch.

>
> 3.
> Just to confirm - In SnapshotRestore(), the added attribute is not restored
> from the disk. This is intentional because 1) it has been set to false when it
> was serilizing to disk and 2) the destination (SnapBuild *builder) is initialized
> by palloc0() so it was set to false. Is it right?

Right.

> Comments for backblanches patch:
>
> 1.
> According to wikipage [2], new attribute must be at the end of struct.

Good point. Will fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Haifang Wang (Centific Technologies Inc) 2024-07-08 15:52:40 RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 51961374
Previous Message Sandeep Thakkar 2024-07-08 12:58:01 Re: Error when installing PostgreSQL 16.3.2 under system context NT AUTHORITY\SYSTEM