Re: Slow catchup of 2PC (twophase) transactions on replica in LR

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-07-04 11:31:40
Message-ID: CAA4eK1LhYsNq_0XKspfWKx4oXZpXk_q-A1Oa-tbA-wqQUudhCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 4, 2024 at 1:34 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > >
> > > It succeeds if force_alter is also expressly set. Prepared transactions will be
> > > aborted at that time.
> > >
> > > ```
> > > subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter =
> > on);
> > > ALTER SUBSCRIPTION
> > >
> >
> > Isn't it better to give a Notice when force_alter option leads to the
> > rollback of already prepared transactions?
>
> Indeed. I think this can be added for 0003. For now, it says like:
>
> ```
> postgres=# ALTER SUBSCRIPTION sub SET (TWO_PHASE = off, FORCE_ALTER = on);
> WARNING: requested altering to two_phase = false but there are prepared transactions done by the subscription
> DETAIL: Such transactions are being rollbacked.
> ALTER SUBSCRIPTION
>

Is it possible to get a NOTICE instead of a WARNING?

>
> > I have another question on the latest 0001 patch:
> > + /*
> > + * Stop all the subscription workers, just in case.
> > + * Workers may still survive even if the subscription is
> > + * disabled.
> > + */
> > + logicalrep_workers_stop(subid);
> >
> > In which case the workers will survive when the subscription is disabled?
>
> I think both normal and tablesync worker can survive, because ALTER SUBSCRIPTION
> DISABLE command does not send signal to workers. It just change the system catalog.
> logicalrep_workers_stop() is added to ensure all workers are stopped.
>
> Actually, earlier version (-v3) did not have a mechanism but they sometimes got
> assertion failures in maybe_reread_subscription(). This was because the survived
> workers read pg_subscription catalog and failed below assertion:
>
> ```
> /* two-phase cannot be altered while the worker exists */
> Assert(newsub->twophasestate == MySubscription->twophasestate);
> ```
>

But that is not a good reason for this operation to stop workers
first. Instead, we should prohibit this operation if any worker is
present. The reason is that there is always a chance that if any
worker is alive, it can prepare a new transaction after we have
checked for the presence of any prepared transactions.

Comments:
=========
1.
There is no need to do something remarkable regarding
+ * the "false" to "true" case; the backend process alters
+ * subtwophase <funny_char> to LOGICALREP_TWOPHASE_STATE_PENDING once.
+ * After the subscription is enabled, a new logical
+ * replication worker requests to change the two_phase
+ * option of its slot from pending to true when the
+ * initial data synchronization is done. The code path is
+ * the same as the case in which two_phase <funny_char> is initially
+ * set <funny_char> to true.

The patch has some funny characters in the above comment at the places
highlighted by me. It seems you have copied from some editor that has
inserted such characters.

2.
/*
* Do not allow toggling of two_phase option. Doing so could cause
* missing of transactions and lead to an inconsistent replica.
* See comments atop worker.c
*
* Note: Unsupported twophase indicates that this call originated
* from AlterSubscription.
*/
if (!IsSet(supported_opts, SUBOPT_TWOPHASE_COMMIT))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("unrecognized subscription parameter: \"%s\"", defel->defname)));

This part of the code must either be removed or converted to an assert.

3. The tests added in 099_twophase_added.pl should be part of 021_twophase.pl

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-07-04 11:35:00 Re: Make query cancellation keys longer
Previous Message Bertrand Drouvot 2024-07-04 11:30:17 Re: Pluggable cumulative statistics