From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Potential data loss due to race condition during logical replication slot creation |
Date: | 2024-07-09 05:29:52 |
Message-ID: | CAD21AoBV7odVHcTsXS2VNUKqw3J0AB_jaRNhL0UuXJ7=kOkLFw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Jul 8, 2024 at 11:02 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jul 8, 2024 at 6:01 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Sawada-san,
> >
> > Thanks for creating the patch!
> >
> > > I've attached updated patches for HEAD and pg17 for now (I will create
> > > the patch for other backbranches).
> > >
> > > In the patches, I used a different approach in between HEAD and
> > > backbranches. On HEAD, we store a flag indicating whether or not we
> > > should skip snapshot restores into the SnapBuild struct and set it
> > > only while finding the start point. Therefore we have to bump
> > > SNAPBUILD_VERSION. On backbranches, I used the approach we discussed
> > > above; store the flag in LogicalDecodingContext and set it when
> > > creating the LogicalDecodingContext for a new logical slot. A possible
> > > downside of the approach taken for backbranches is that we implicitly
> > > require for users to use the same LogicalDecodingContext for both
> > > initializing the context for a new slot and finding its start point.
> > > IIUC it was not strictly required. This restriction would not be a
> > > problem at least in the core, but I'm not sure if there are no
> > > external extensions that create a logical slot in that way. This is
> > > the main reason why I used a different approach on HEAD and
> > > backbranches. Therefore, if it does not matter, I think we can use the
> > > same approach on all branches, which is better in terms of
> > > maintainability.
> >
> > I want to confirm your point. You meant that you wanted to unify appraoches,
> > especially you wanted to store the flag in LogicalDecodingContext, rigth?
>
> Yes. Ideally I'd like to use the same approach in all branches
> regardless of how to fix it for better maintainability.
>
> > I
> > briefly grepped github repos with "DecodingContextFindStartpoint", and cannot
> > find such pattern. E.g., [1]. But this point must be pointed by extension developers.
>
> Thank you for searching on github. It will also affect the future
> extension developments so I'm slightly hesitant to do that.
>
> > >
> > > Also, I excluded the test case for the problem that Kuroda-san
> > > reported[1] since the reported problem occurred due to the same cause
> > > of the problem originally reported on this thread. The bug that
> > > decodes only partial transactions could lead to various symptoms. IIUC
> > > these test cases test the same behavior.
> >
> > Yes, I also think they are caused by the same root cause, so we can skip.
> >
> > Comments for HEAD patch:
> > 1.
> > You must modify test_decoding/meson.build. It was also missing in patches for
> > backbranches.
> >
> > 2.
> > The test code needs cleanup. E.g.,
> > - Steps "s0_insert_cat" and "s0_savepoint" is not used
> > - Table user_cat is not used
> > It was also missing in patches for backbranches.
>
> Will fix the above points in the next version patch.
>
> >
> > 3.
> > Just to confirm - In SnapshotRestore(), the added attribute is not restored
> > from the disk. This is intentional because 1) it has been set to false when it
> > was serilizing to disk and 2) the destination (SnapBuild *builder) is initialized
> > by palloc0() so it was set to false. Is it right?
>
> Right.
>
> > Comments for backblanches patch:
> >
> > 1.
> > According to wikipage [2], new attribute must be at the end of struct.
>
> Good point. Will fix.
>
I've attached the new version patches for all branches.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
REL14_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 11.7 KB |
master_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 12.3 KB |
REL16_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 12.0 KB |
REL15_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 11.7 KB |
REL17_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 12.3 KB |
REL12_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 11.7 KB |
REL13_v2-0001-Fix-possibility-of-logical-decoding-partial-trans.patch | application/octet-stream | 11.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-07-09 09:27:30 | Re: Potential data loss due to race condition during logical replication slot creation |
Previous Message | Melanie Plageman | 2024-07-08 21:23:52 | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |