Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-07-09 11:35:04
Message-ID: CAA4eK1+HOzhB35PVnmgL-OAtfLnQAsKM1shJODZi3sacTeae9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2024 at 10:51 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
>
> This issue is present in all supported versions. I was able to
> reproduce it using the steps recommended by Andres and Tomas's
> scripts. I also conducted a small test through TAP tests to verify the
> problem. Attached is the alternate_lock_HEAD.patch, which includes the
> lock modification(Tomas's change) and the TAP test.
>

@@ -1568,7 +1568,7 @@ OpenTableList(List *tables)
/* Allow query cancel in case this takes a long time */
CHECK_FOR_INTERRUPTS();

- rel = table_openrv(t->relation, ShareUpdateExclusiveLock);
+ rel = table_openrv(t->relation, ShareRowExclusiveLock);

The comment just above this code ("Open, share-lock, and check all the
explicitly-specified relations") needs modification. It would be
better to explain the reason of why we would need SRE lock here.

> To reproduce the issue in the HEAD version, we cannot use the same
> test as in the alternate_lock_HEAD patch because the behavior changes
> slightly after the fix to wait for the lock until the open transaction
> completes.
>

But won't the test that reproduces the problem in HEAD be successful
after the code change? If so, can't we use the same test instead of
slight modification to verify the lock mode?

> The attached issue_reproduce_testcase_head.patch can be
> used to reproduce the issue through TAP test in HEAD.
> The changes made in the HEAD version do not directly apply to older
> branches. For PG14, PG13, and PG12 branches, you can use the
> alternate_lock_PG14.patch.
>

Why didn't you include the test in the back branches? If it is due to
background psql stuff, then won't commit
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=187b8991f70fc3d2a13dc709edd408a8df0be055)
can address it?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-07-09 11:39:31 Re: Address the -Wuse-after-free warning in ATExecAttachPartition()
Previous Message Tomas Vondra 2024-07-09 11:34:26 Re: 回复: An implementation of multi-key sort