Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2025-02-28 06:42:12
Message-ID: CAA4eK1LZFthizh6LOy7D2-5Kf7vrMJfUqkV3AcYZN0CagCEVJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 28, 2025 at 9:45 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Feb 28, 2025 at 6:15 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> > >
> > > I think distributing invalidations to a transaction that has not yet built a
> > > base snapshot is un-necessary. This is because, during the process of building
> > > its base snapshot, such a transaction will have already recorded the XID of the
> > > transaction that altered the publication information into its array of
> > > committed XIDs. Consequently, it will reflect the latest changes in the catalog
> > > from the beginning. In the context of logical decoding, this scenario is
> > > analogous to decoding a new transaction initiated after the catalog-change
> > > transaction has been committed.
> > >
> > > The original issue arises because the catalog cache was constructed using an
> > > outdated snapshot that could not reflect the latest catalog changes. However,
> > > this is not a problem in cases without a base snapshot. Since the existing
> > > catalog cache should have been invalidated upon decoding the committed
> > > catalog-change transaction, the subsequent transactions will construct a new
> > > cache with the latest snapshot.
> >
> > I've also concluded it's not necessary but the reason and analysis
> > might be somewhat different.
>

Based on the discussion on this point and Hou-San's proposed comment,
I have tried to add/edit a few comments in 0001 patch. See, if those
make sense to you, it is important to capture the reason and theory we
discussed here in the form of comments so that it will be easy to
remember the reason in the future.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v17-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch application/octet-stream 15.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-28 07:27:29 Re: Get rid of WALBufMappingLock
Previous Message Pavel Stehule 2025-02-28 06:29:49 Re: SQLFunctionCache and generic plans