Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Nitin Motiani <nitinmotiani(at)google(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-07-17 06:24:45
Message-ID: CAA4eK1+xg2bW1Ey6onoKvkHbdDvq224wazNvGKmoBdGFHxuNMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 16, 2024 at 6:54 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, 16 Jul 2024 at 11:59, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 16, 2024 at 9:29 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > One related comment:
> > > @@ -1219,8 +1219,14 @@ AlterPublicationTables(AlterPublicationStmt
> > > *stmt, HeapTuple tup,
> > > oldrel = palloc(sizeof(PublicationRelInfo));
> > > oldrel->whereClause = NULL;
> > > oldrel->columns = NIL;
> > > +
> > > + /*
> > > + * Data loss due to concurrency issues are avoided by locking
> > > + * the relation in ShareRowExclusiveLock as described atop
> > > + * OpenTableList.
> > > + */
> > > oldrel->relation = table_open(oldrelid,
> > > - ShareUpdateExclusiveLock);
> > > + ShareRowExclusiveLock);
> > >
> > > Isn't it better to lock the required relations in RemovePublicationRelById()?
> > >
> >
> > On my CentOS VM, the test file '100_bugs.pl' takes ~11s without a
> > patch and ~13.3s with a patch. So, 2 to 2.3s additional time for newly
> > added tests. It isn't worth adding this much extra time for one bug
> > fix. Can we combine table and schema tests into one single test and
> > avoid inheritance table tests as the code for those will mostly follow
> > the same path as a regular table?
>
> Yes, that is better. The attached v6 version patch has the changes for the same.
> The patch also addresses the comments from [1].
>

Thanks, I don't see any noticeable difference in test timing with new
tests. I have slightly modified the comments in the attached diff
patch (please rename it to .patch).

BTW, I noticed that we don't take any table-level locks for Create
Publication .. For ALL TABLES (and Drop Publication). Can that create
a similar problem? I haven't tested so not sure but even if there is a
problem for the Create case, it should lead to some ERROR like missing
publication.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v6-topup-amit.patch.txt text/plain 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-07-17 06:29:07 Re: New function normal_rand_array function to contrib/tablefunc.
Previous Message shveta malik 2024-07-17 06:01:12 Re: Conflict Detection and Resolution