Re: [BUG?] check_exclusion_or_unique_constraint false negative

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative
Date: 2024-08-02 04:56:40
Message-ID: CAA4eK1Jfb0xviXYon-_TvHNKeAY7ngAeo++Knu-0RPR6EkSBjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2024 at 2:55 PM Michail Nikolaev
<michail(dot)nikolaev(at)gmail(dot)com> wrote:
>
> > Thanks for pointing out the issue!
>
> Thanks for your attention!
>
> > IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
> > the same tuples, which is not specific to the parallel apply
>
> Not exactly, it happens for any DirtySnapshot scan over a B-tree index with some other transaction updating the same index page (even using the MVCC snapshot).
>
> So, logical replication related scenario looks like this:
>
> * subscriber worker receives a tuple update\delete from the publisher
> * it calls RelationFindReplTupleByIndex to find the tuple in the local table
> * some other transaction updates the tuple in the local table (on subscriber side) in parallel
> * RelationFindReplTupleByIndex may not find the tuple because it uses DirtySnapshot
> * update\delete is lost
>
> Parallel apply mode looks like more dangerous because it uses multiple workers on the subscriber side, so the probability of the issue is higher.
> In that case, "some other transaction" is just another worker applying changes of different transaction in parallel.
>

I think it is rather less likely or not possible in a parallel apply
case because such conflicting updates (updates on the same tuple)
should be serialized at the publisher itself. So one of the updates
will be after the commit that has the second update.

I haven't tried the test based on your description of the general
problem with DirtySnapshot scan. In case of logical replication, we
will LOG update_missing type of conflict and the user may need to take
some manual action based on that. I have not tried a test so I could
be wrong as well. I am not sure we can do anything specific to logical
replication for this but feel free to suggest if you have ideas to
solve this problem in general or specific to logical replication.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-08-02 05:22:38 Re: [Patch] remove duplicated smgrclose
Previous Message David G. Johnston 2024-08-02 04:36:21 Casts from jsonb to other types should cope with json null