From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: long-standing data loss bug in initial sync of logical replication |
Date: | 2024-07-10 16:37:29 |
Message-ID: | CALDaNm1yRdMmGE+RO+Friy=9ac2cFDpdZ4Tx1FoonCFx3XRt6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jul 9, 2024 at 8:14 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Tue, 9 Jul 2024 at 17:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jul 1, 2024 at 10:51 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > This issue is present in all supported versions. I was able to
> > > > reproduce it using the steps recommended by Andres and Tomas's
> > > > scripts. I also conducted a small test through TAP tests to verify the
> > > > problem. Attached is the alternate_lock_HEAD.patch, which includes the
> > > > lock modification(Tomas's change) and the TAP test.
> > > >
> > >
> > > @@ -1568,7 +1568,7 @@ OpenTableList(List *tables)
> > > /* Allow query cancel in case this takes a long time */
> > > CHECK_FOR_INTERRUPTS();
> > >
> > > - rel = table_openrv(t->relation, ShareUpdateExclusiveLock);
> > > + rel = table_openrv(t->relation, ShareRowExclusiveLock);
> > >
> > > The comment just above this code ("Open, share-lock, and check all the
> > > explicitly-specified relations") needs modification. It would be
> > > better to explain the reason of why we would need SRE lock here.
> >
> > Updated comments for the same.
> >
>
> The patch missed to use the ShareRowExclusiveLock for partitions, see
> attached. I haven't tested it but they should also face the same
> problem. Apart from that, I have changed the comments in a few places
> in the patch.
I could not hit the updated ShareRowExclusiveLock changes through the
partition table, instead I could verify it using the inheritance
table. Added a test for the same and also attaching the backbranch
patch.
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Fix-data-loss-during-initial-sync-in-logical-repl_HEAD.patch | text/x-patch | 8.4 KB |
v4-0001-Fix-data-loss-during-initial-sync-in-logical-repl_PG14.patch | text/x-patch | 9.1 KB |
v4-0001-Fix-data-loss-during-initial-sync-in-logical-repl_PG12.patch | text/x-patch | 9.0 KB |
v4-0001-Fix-data-loss-during-initial-sync-in-logical-repl_PG13.patch | text/x-patch | 9.0 KB |
v4-0001-Fix-data-loss-during-initial-sync-in-logical-repl_PG16.patch | text/x-patch | 8.5 KB |
v2_issue_reproduce_testcase_head.patch | text/x-patch | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-07-10 17:05:50 | Re: Should we work around msvc failing to compile tab-complete.c? |
Previous Message | Robert Haas | 2024-07-10 16:35:31 | Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal |