Quick Links

Re: long-standing data loss bug in initial sync of logical replication

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: long-standing data loss bug in initial sync of logical replication
Date:	2025-04-23 06:31:37
Message-ID:	CAA4eK1KUoa1yVOXfGUvgA2kVQ2VvfqWU1LSzzeGHM+6=sbcVkw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 22, 2025 at 10:57 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Mar 18, 2025 at 2:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > > Regarding the PG13, it cannot be
> > > > applied
> > > > as-is thus some adjustments are needed. I will share it in upcoming posts.
> > >
> > > Here is a patch set for PG13. Apart from PG14-17, the patch could be created as-is,
> > > because...
> > >
> > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not exist.
> > > 2. Thus the ReorderBufferChange for the invalidation does not exist.
> > > Our patch tries to distribute it but cannot be done as-is.
> > > 3. Codes assumed that invalidation messages can be added only once.
> > > 4. The timing when invalidation messages are consumed is limited:
> > > a. COMMAND_ID change is poped,
> > > b. start of decoding a transaction, or
> > > c. end of decoding a transaction.
> > >
> > > Above means that invalidations cannot be executed while being decoded.
> > > I created two patch sets to resolve the data loss issue. 0001 has less code
> > > changes but could resolve a part of issue, 0002 has huge changes but provides a
> > > complete solution.
> > >
> > > 0001 - mostly same as patches for other versions. ReorderBufferAddInvalidations()
> > > was adjusted to allow being called several times. As I said above,
> > > 0001 cannot execute inval messages while decoding the transacitons.
> > > 0002 - introduces new ReorderBufferChange type to indicate inval messages.
> > > It would be handled like PG14+.
> > >
> > > Here is an example. Assuming that the table foo exists on both nodes, a
> > > publication "pub" which publishes all tables, and a subscription "sub" which
> > > subscribes "pub". What if the workload is executed?
> > >
> > > ```
> > > S1 S2
> > > BEGIN;
> > > INSERT INTO foo VALUES (1)
> > > ALTER PUBLICATION pub RENAME TO pub_renamed;
> > > INSERT INTO foo VALUES (2)
> > > COMMIT;
> > > LR -> ?
> > > ```
> > >
> > > With 0001, tuples (1) and (2) would be replicated to the subscriber.
> > > An error "publication "pub" does not exist" would raise when new changes are done
> > > later.
> > >
> > > 0001+0002 works more aggressively; the error would raise when S1 transaction is decoded.
> > > The behavior is same as for patched PG14-PG17.
> > >
> >
> > I understand that with 0001 the fix is partial in the sense that
> > because invalidations are processed at the transaction end, the
> > changes of concurrent DDL will only be reflected for the next
> > transaction. Now, on one hand, it is prudent to not add a new type of
> > ReorderBufferChange in the backbranch (v13) but the change is not that
> > invasive, so we can go with it as well. My preference would be to go
> > with just 0001 for v13 to minimize the risk of adding new bugs or
> > breaking something unintentionally.
> >
> > Sawada-San, and others involved here, do you have any suggestions on
> > this matter?
>
> Sorry for the late response.
>
> I agree with just 0001 for v13 as 0002 seems invasive. Given that v13
> would have only a few releases until EOL and 0001 can deal with some
> cases in question, I'd like to avoid such invasive changes in v13.
>

Fair enough. OTOH, we can leave the 13 branch considering following:
(a) it is near EOL, (b) bug happens in rare cases (when the DDLs like
ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take
a strong lock on table happens concurrently to DMLs on the tables
involved in the DDL.), and (c) the complete fix is invasive, even
partial fix is not simple. I have a slight fear that if we make any
mistake in fixing it partially (of course, we can't see any today), we
may not even get a chance to fix it.

Now, if the above convinces you or someone else not to push the
partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day
after tomorrow.

--
With Regards,
Amit Kapila.

In response to

Re: long-standing data loss bug in initial sync of logical replication at 2025-04-22 17:26:51 from Masahiko Sawada

Responses

Re: long-standing data loss bug in initial sync of logical replication at 2025-04-23 16:58:20 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2025-04-23 07:13:14	Re: Add Pipelining support in psql
Previous Message	shveta malik	2025-04-23 06:31:26	Re: Fix premature xmin advancement during fast forward decoding