Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-07-10 06:58:41
Message-ID: CAA4eK1Jw5g2pC-MEQE93fozv=o=YSoAH0++W0MkL0PkXxnyX=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 9, 2024 at 8:14 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, 9 Jul 2024 at 17:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 1, 2024 at 10:51 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > >
> > > This issue is present in all supported versions. I was able to
> > > reproduce it using the steps recommended by Andres and Tomas's
> > > scripts. I also conducted a small test through TAP tests to verify the
> > > problem. Attached is the alternate_lock_HEAD.patch, which includes the
> > > lock modification(Tomas's change) and the TAP test.
> > >
> >
> > @@ -1568,7 +1568,7 @@ OpenTableList(List *tables)
> > /* Allow query cancel in case this takes a long time */
> > CHECK_FOR_INTERRUPTS();
> >
> > - rel = table_openrv(t->relation, ShareUpdateExclusiveLock);
> > + rel = table_openrv(t->relation, ShareRowExclusiveLock);
> >
> > The comment just above this code ("Open, share-lock, and check all the
> > explicitly-specified relations") needs modification. It would be
> > better to explain the reason of why we would need SRE lock here.
>
> Updated comments for the same.
>

The patch missed to use the ShareRowExclusiveLock for partitions, see
attached. I haven't tested it but they should also face the same
problem. Apart from that, I have changed the comments in a few places
in the patch.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v3-0001-Fix-data-loss-during-initial-sync-in-logical-repl.patch application/octet-stream 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2024-07-10 07:02:32 Re: Doc: fix track_io_timing description to mention pg_stat_io
Previous Message Antonin Houska 2024-07-10 06:56:27 Re: why there is not VACUUM FULL CONCURRENTLY?