Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Date: 2024-11-08 03:05:38
Message-ID: CAA4eK1+Wb=tcZgny0yCQkJotBRcUWvawcWJStCn8YsPb1AP5YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 8, 2024 at 12:53 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I came across this commit while preparing release notes, and I'm
> worried about whether it doesn't create more problems than it solves.
> The intent stated in the thread subject is to prevent an apply worker
> from advancing past a prepared transaction if the subscriber side
> doesn't permit prepared transactions. However, it appears to me that
> the committed patch doesn't permit an apply worker to advance past
> any failing transaction whatsoever. Was any thought given to how
> a DBA would get out of such a situation and get replication flowing
> again? In the prepared-xact case, it's at least clear that you
> could increase max_prepared_transactions and restart the subscriber
> installation. In the general case, it's not very obvious that you'd
> even know what the problem is let alone have an easy way to fix it.
>

This is by design, so we don't let the apply worker proceed in case of
any ERROR. For example, the apply worker keeps retrying to apply the
transaction when there is a unique key violation error while applying
(which could be due to the subscriber side having a unique key defined
but the publisher doesn't or the subscriber already has a row with the
same value). We need manual intervention to continue the replication.
To do that, she can create a subscription with the option
'disable_on_error'. Then apply worker will stop on ERROR instead of
retrying. Then, she can either manually remove a conflicting row or
use ALTER SUBSCRIPTION ... SKIP ... to get past the conflicting/error
transaction. Alternatively, the transaction can also be skipped by
calling the pg_replication_origin_advance() function. To use SKIP or
origin_advance function, in the ERROR log we print the LSN (CONTEXT:
processing remote data for replication origin "pg_16395" during
"INSERT" for replication target relation "public.test" in transaction
725 finished at 0/14C0378). She needs to use LSN value 0/14C0378 to
skip the error transaction. We have explained this in the docs [1].

> In other words: I thought the original design here was to
> intentionally ignore apply errors and keep going, on the theory that
> that was better than blocking replication altogether
>

No, that was not the intention because otherwise, we will silently
create inconsistency on the subscriber side.

. This commit
> has reversed that decision, on the strength of little or no
> discussion AFAICS. Are we really ready to push this into minor
> releases of stable branches? Is it a good idea even on HEAD?
>

I hope the explanation above addresses your concern.

[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-11-08 03:25:29 Re: Incremental Sort Cost Estimation Instability
Previous Message Tatsuo Ishii 2024-11-08 02:47:13 Re: Fix for Extra Parenthesis in pgbench progress message