Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2025-03-18 09:55:39
Message-ID: CAA4eK1LZpwcyYQbXt4hJd1nkepV3qyuv28sfN949Kx_9GyOrVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > Regarding the PG13, it cannot be
> > applied
> > as-is thus some adjustments are needed. I will share it in upcoming posts.
>
> Here is a patch set for PG13. Apart from PG14-17, the patch could be created as-is,
> because...
>
> 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not exist.
> 2. Thus the ReorderBufferChange for the invalidation does not exist.
> Our patch tries to distribute it but cannot be done as-is.
> 3. Codes assumed that invalidation messages can be added only once.
> 4. The timing when invalidation messages are consumed is limited:
> a. COMMAND_ID change is poped,
> b. start of decoding a transaction, or
> c. end of decoding a transaction.
>
> Above means that invalidations cannot be executed while being decoded.
> I created two patch sets to resolve the data loss issue. 0001 has less code
> changes but could resolve a part of issue, 0002 has huge changes but provides a
> complete solution.
>
> 0001 - mostly same as patches for other versions. ReorderBufferAddInvalidations()
> was adjusted to allow being called several times. As I said above,
> 0001 cannot execute inval messages while decoding the transacitons.
> 0002 - introduces new ReorderBufferChange type to indicate inval messages.
> It would be handled like PG14+.
>
> Here is an example. Assuming that the table foo exists on both nodes, a
> publication "pub" which publishes all tables, and a subscription "sub" which
> subscribes "pub". What if the workload is executed?
>
> ```
> S1 S2
> BEGIN;
> INSERT INTO foo VALUES (1)
> ALTER PUBLICATION pub RENAME TO pub_renamed;
> INSERT INTO foo VALUES (2)
> COMMIT;
> LR -> ?
> ```
>
> With 0001, tuples (1) and (2) would be replicated to the subscriber.
> An error "publication "pub" does not exist" would raise when new changes are done
> later.
>
> 0001+0002 works more aggressively; the error would raise when S1 transaction is decoded.
> The behavior is same as for patched PG14-PG17.
>

I understand that with 0001 the fix is partial in the sense that
because invalidations are processed at the transaction end, the
changes of concurrent DDL will only be reflected for the next
transaction. Now, on one hand, it is prudent to not add a new type of
ReorderBufferChange in the backbranch (v13) but the change is not that
invasive, so we can go with it as well. My preference would be to go
with just 0001 for v13 to minimize the risk of adding new bugs or
breaking something unintentionally.

Sawada-San, and others involved here, do you have any suggestions on
this matter?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2025-03-18 10:19:32 Re: Draft for basic NUMA observability
Previous Message Bertrand Drouvot 2025-03-18 09:51:14 Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits