Re: Slow catchup of 2PC (twophase) transactions on replica in LR

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-05-14 04:18:45
Message-ID: CAHut+PssnvZ86cZw5WGEoOU3mAzzMEBwYcyFG9Th0XM=n49H5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda-san, Here are some review comments for all patches v9*

//////////
Patch v9-0001
//////////

There were no changes since v8-0001, so no comments.

//////////
Patch v9-0002
//////////

======
Commit Message

2.1.
Regarding the off->on case, the logical replication already has a
mechanism for it, so there is no need to do anything special for the
on->off case; however, we must connect to the publisher and expressly
change the parameter. The operation cannot be rolled back, and
altering the parameter from "on" to "off" within a transaction is
prohibited.

In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time.

~

This explanation seems to be going around in circles, without giving
any new information:

AFAICT, "Regarding the off->on case, the logical replication already
has a mechanism for it, so there is no need to do anything special for
the on->off case;"

is saying pretty much the same as:

"In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time."

But, what I hoped for in previous review comments was an explanation
somewhat less vague than "already has a mechanism" or "already had the
mechanism". Can't this have just 1 or 2 lines to say WHAT is that
existing mechanism for the "off" to "on" case, and WHY that means
there is nothing special to do in that scenario?

======
src/backend/commands/subscriptioncmds.c

2.2. AlterSubscription

/*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Since altering the two_phase option of subscriptions
+ * also leads to changing the slot option, this command
+ * cannot be rolled back. So prevent this if we are in a
+ * transaction block. In the opposite case, there is no
+ * need to prevent this because the logical replication
+ * worker already had the mechanism to alter the slot
+ * option at a convenient time.
*/

(Same previous review comments, and same as my review comment for the
commit message above).

I don't think "already had the mechanism" is enough explanation.

Also, the 2nd sentence doesn't make sense here because the comment
only said "altering the slot option" -- it didn't say it was altering
it to "on" or altering it to "off", so "the opposite case" has no
meaning.

~~~

2.3. AlterSubscription

/*
- * Try to acquire the connection necessary for altering slot.
+ * Check the need to alter the replication slot. Failover and two_phase
+ * options are controlled by both the publisher (as a slot option) and the
+ * subscriber (as a subscription option). The slot option must be altered
+ * only when changing "on" to "off". Because in opposite case, the logical
+ * replication worker already has the mechanism to do so at a convenient
+ * time.
+ */
+ update_failover = replaces[Anum_pg_subscription_subfailover - 1];
+ update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
+ !opts.twophase);

This is again the same as other review comments above. Probably, when
some better explanation can be found for "already has the mechanism to
do so at a convenient time." then all of these places can be changed
using similar text.

//////////
Patch v9-0003
//////////

There are some imperfect code comments but AFAIK they are the same
ones from patch 0002. I think patch 0003 is just moving those comments
to different places, so probably they would already be addressed by
patch 0002.

//////////
Patch v9-0004
//////////

======
doc/src/sgml/catalogs.sgml

4.1.
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>subforcealter</structfield> <type>bool</type>
+ </para>
+ <para>
+ If true, the subscription can be altered <literal>two_phase</literal>
+ option, even if there are prepared transactions
+ </para></entry>
+ </row>
+

BEFORE
If true, the subscription can be altered <literal>two_phase</literal>
option, even if there are prepared transactions

SUGGESTION
If true, then the ALTER SUBSCRIPTION command can disable
<literal>two_phase</literal> option, even if there are uncommitted
prepared transactions from when <literal>two_phase</literal> was
enabled

======
doc/src/sgml/ref/alter_subscription.sgml

4.2.
-
- <para>
- The <literal>two_phase</literal> parameter can only be altered when the
- subscription is disabled. When altering the parameter from
<literal>on</literal>
- to <literal>off</literal>, the backend process checks for any incomplete
- prepared transactions done by the logical replication worker (from when
- <literal>two_phase</literal> parameter was still <literal>on</literal>)
- and, if any are found, those are aborted.
- </para>

Well, I still think there ought to be some mention of the relationship
between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION
page. Then the user can cross-reference to read what the 'force_alter'
actually does.

======
doc/src/sgml/ref/create_subscription.sgml

4.3.
+
+ <varlistentry id="sql-createsubscription-params-with-force-alter">
+ <term><literal>force_alter</literal> (<type>boolean</type>)</term>
+ <listitem>
+ <para>
+ Specifies whether the subscription can be altered
+ <literal>two_phase</literal> option, even if there are prepared
+ transactions. If specified, the backend process checks for any
+ incomplete prepared transactions done by the logical replication
+ worker (from when <literal>two_phase</literal> parameter was still
+ <literal>on</literal>), if any are found, those are aborted.
+ Otherwise, Altering the parameter from <literal>on</literal> to
+ <literal>off</literal> will give an error when there are prepared
+ transactions done by the logical replication worker.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>

This explanation seems a bit repetitive. I think it can be improved as follows:

SUGGESTION
Specifies if the ALTER SUBSCRIPTION can be forced to proceed instead
of giving an error.

There is currently only one scenario where this parameter has any
effect: When altering two_phase option from true to false it is
possible for there to be incomplete prepared transactions done by the
logical replication worker (from when two_phase parameter was still
true). If force_alter is false, then this will give an error; if
force_alter is true, then the incomplete prepared transactions are
aborted and the alter will proceed.

The default is false.

======
src/backend/commands/subscriptioncmds.c

4.4. CreateSubscription

values[Anum_pg_subscription_subfailover - 1] = BoolGetDatum(opts.failover);
+ values[Anum_pg_subscription_subforcealter] = BoolGetDatum(opts.force_alter);
values[Anum_pg_subscription_subconninfo - 1] =

Hmm, looks like a bug. Shouldn't that index say -1?

~~~
4.5. AlterSubscription

+ /*
+ * Abort prepared transactions only if
+ * 'force_alter' option is true. Otherwise raise
+ * an ERROR.
+ */
+ if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER))
+ {
+ if (!opts.force_alter)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = off"),
+ errhint("Resolve these transactions or set %s, and then try again.",
+ "force_alter = true")));
+ }
+ else
+ {
+ if (!sub->forcealter)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = off"),
+ errhint("Resolve these transactions or set %s, and then try again.",
+ "force_alter = true")));
+ }
+

IIUC this code can be simplified to remove the error duplication.
Something like below:

SUGGESTION

bool raise_error = IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) ?
!opts.force_alter : !sub->forcealter;

if (raise_error)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot alter %s when there are prepared transactions",
"two_phase = off"),
errhint("Resolve these transactions or set %s, and then try again.",
"force_alter = true")));

======
src/bin/pg_dump/pg_dump.c

4.6. getSubscriptions

+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " s.subforcealter\n");
+ else
+ appendPQExpBuffer(query,
+ " false AS subforcealter\n");
+
+

4.6a.
Should this just be combined with the existing "if
(fout->remoteVersion >= 170000)" for failover?

~

4.6b.
Double blank lines.

======
src/bin/psql/describe.c

4.7.
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+ ", subforcealter AS \"%s\"\n",
+ gettext_noop("Force_alter"));

IMO the column title should be "Force alter" (i.e. without the underscore)

======
src/include/catalog/pg_subscription.h

4.8. CATALOG

+ bool subforcealter; /* True if we allow to drop prepared transactions
+ * when altering two_phase "on"->"off". */

I think this is not actually the description of 'force_alter'. What
you wrote just happens to be the only case that this option currently
works for. Maybe a more correct description is something like below.

SUGGESTION
True allows the ALTER SUBSCRIPTION command to proceed under conditions
that would otherwise result in an error. Currently, 'force_alter' only
has an effect when altering the two_phase option from "true" to
"false".

~~~

4.9. struct Subscription

+ bool forcealter; /* True if we allow to drop prepared
+ * transactions when altering two_phase
+ * "on"->"off". */

Ditto the previous review comment.

======
src/test/regress/expected/subscription.out

4.10.
-
List of subscriptions
- Name | Owner | Enabled | Publication
| Binary | Streaming | Two-phase commit | Disable on error | Origin |
Password required | Run as owner? | Failover | Synchronous commit |
Conninfo | Skip LSN
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f | {testpub}
| f | off | d | f | none |
t | f | f | off |
dbname=regress_doesnotexist | 0/0
+
List of
subscriptions
+ Name | Owner | Enabled | Publication
| Binary | Streaming | Two-phase commit | Disable on error | Origin |
Password required | Run as owner? | Failover | Force_alter |
Synchronous commit | Conninfo | Skip LSN
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+-------------+--------------------+-----------------------------+----------

The column heading should be "Force alter", as already mentioned by an
earlier review comment (#4.7)

======
src/test/subscription/t/099_twophase_added.pl

4.11.

+# Alter the two_phase with the force_alter option. Apart from the the last
+# ALTER SUBSCRIPTION command, the command will abort the prepared transaction
+# and succeed.

There is typo "the the" and the wording is a bit strange. Why not just say:

SUGGESTION
Alter the two_phase true to false with the force_alter option enabled.
This command will succeed after aborting the prepared transaction.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-05-14 05:12:38 Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid
Previous Message David Rowley 2024-05-14 04:10:52 Re: JIT compilation per plan node