RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-05-16 05:02:55
Message-ID: OSBPR01MB25522052F9F3E3AAD3BA2A8CF5ED2@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thanks for reviewing! Here are new patches.

>
> //////////
> patch v10-0002
> //////////
>
> ======
> Commit message
>
> 2.1.
> Regarding the false->true case, the backend process alters the subtwophase to
> LOGICALREP_TWOPHASE_STATE_PENDING once. After the subscription is
> enabled, a new
> logical replication worker requests to change the two_phase option of its slot
> from pending to true after the initial data synchronization is done. The code
> path is the same as the case in which two_phase is initially set to true, so
> there is no need to do something remarkable. However, for the true->false case,
> the backend must connect to the publisher and expressly change the parameter
> because the apply worker does not alter the option to false. The
> operation cannot
> be rolled back, and altering the parameter from "true" to "false" within a
> transaction is prohibited.
>
> ~
>
> BEFORE
> The operation cannot be rolled back, and altering the parameter from
> "true" to "false" within a transaction is prohibited.
>
> SUGGESTION
> Because this operation cannot be rolled back, altering the two_phase
> parameter from "true" to "false" within a transaction is prohibited.

Fixed.

>
> ======
> doc/src/sgml/ref/alter_subscription.sgml
>
> 2.2.
> <command>ALTER SUBSCRIPTION ... SET (failover = on|off)</command>
> and
> - <command>ALTER SUBSCRIPTION ... SET (two_phase =
> on|off)</command>
> + <command>ALTER SUBSCRIPTION ... SET (two_phase = off)</command>
>
> I wasn't sure why you chose to keep on|off here instead of true|false,
> since in subsequence patch 0003 you changed it true/false everywhere
> as discussed in previous reviews.
>
> OTOH if you only did this to be consistent with the "failover=on|off"
> then that is OK; but in that case I might raise a separate hackers
> thread to propose those should also be changed to true|false for
> consistency with the parameer listed on the CREATE SUBSCRIPTION page.
> What do you think?

Yeah, I did not change here, because other parameters were notated as
on/off. I found you started the forked thread [1] so I will revise the patch
after it was accepted.

>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 2.3.
> /*
> - * The changed two_phase option of the slot can't be rolled
> - * back.
> + * Altering the parameter from "true" to "false" within a
> + * transaction is prohibited. Since the apply worker does
> + * not alter the slot option to false, the backend must
> + * connect to the publisher and expressly change the
> + * parameter.
> + *
> + * There is no need to do something remarkable regarding
> + * the "false" to "true" case; the backend process alters
> + * subtwophase to LOGICALREP_TWOPHASE_STATE_PENDING once.
> + * After the subscription is enabled, a new logical
> + * replication worker requests to change the two_phase
> + * option of its slot when the initial data synchronization
> + * is done. The code path is the same as the case in which
> + * two_phase is initially set to true.
> */
>
> BEFORE
> ...worker requests to change the two_phase option of its slot when...
>
> SUGGESTION
> ...worker requests to change the two_phase option of its slot from
> pending to true when...

Fixed.

>
> ======
> src/test/subscription/t/099_twophase_added.pl
>
> 2.4.
> +#####################
> +# Check the case that prepared transactions exist on the publisher node.
> +#
> +# Since the two_phase is "off", then normally, this PREPARE will do nothing
> +# until the COMMIT PREPARED, but in this test, we toggle the two_phase to
> +# "true" again before the COMMIT PREPARED happens.
>
> /Since the two_phase is "off"/Since the two_phase is "false"/

Fixed.

>
> //////////
> patch v10-0003
> //////////
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 3.1. AlterSubscription
>
> + * If two_phase was enabled, there is a possibility that
> + * transactions have already been PREPARE'd. They must be
> + * checked and rolled back.
> */
> if (!opts.twophase)
>
> I think it will less ambiguous if you modify this to say "If two_phase
> was previously enabled"

Fixed.

>
> ~~~
>
> 3.2.
> if (!opts.twophase)
> {
> List *prepared_xacts;
>
> /*
> * Altering the parameter from "true" to "false" within
> * a transaction is prohibited. Since the apply worker
> * does not alter the slot option to false, the backend
> * must connect to the publisher and expressly change
> * the parameter.
> *
> * There is no need to do something remarkable
> * regarding the "false" to "true" case; the backend
> * process alters subtwophase to
> * LOGICALREP_TWOPHASE_STATE_PENDING once. After the
> * subscription is enabled, a new logical replication
> * worker requests to change the two_phase option of
> * its slot when the initial data synchronization is
> * done. The code path is the same as the case in which
> * two_phase is initially set to true.
> */
> if (!opts.twophase)
> PreventInTransactionBlock(isTopLevel,
> "ALTER SUBSCRIPTION ... SET (two_phase = false)");
>
> /*
> * To prevent prepared transactions from being
> * isolated, they must manually be aborted.
> */
> if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
> (prepared_xacts = GetGidListBySubid(subid)) != NIL)
> {
> /* Abort all listed transactions */
> foreach_ptr(char, gid, prepared_xacts)
> FinishPreparedTransaction(gid, false);
>
> list_free_deep(prepared_xacts);
> }
> }
>
> /* Change system catalog acoordingly */
> values[Anum_pg_subscription_subtwophasestate - 1] =
> CharGetDatum(opts.twophase ?
> LOGICALREP_TWOPHASE_STATE_PENDING :
> LOGICALREP_TWOPHASE_STATE_DISABLED);
> replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
> }
>
> ~
>
> Why is "if (!opts.twophase)" being checked at the top and then
> immediately being checed again here:
> + if (!opts.twophase)
> + PreventInTransactionBlock(isTopLevel,
> + "ALTER SUBSCRIPTION ... SET (two_phase = false)");

Oh, this was caused by wrong git operations.

> And then again here:
> CharGetDatum(opts.twophase ?
> LOGICALREP_TWOPHASE_STATE_PENDING :
> LOGICALREP_TWOPHASE_STATE_DISABLED);
>
> There is no need to re-check a flag that was already checked, so
> clearly some of this logic/code is either wrong or redundant.

Right. I added a new variable to store the value to be changed. Thouth?

>
> ======
> src/test/subscription/t/099_twophase_added.pl
>
> (Let's change these on|off to true|false to match what you did already
> in patch 0002).
>
> 3.3.
> +#####################
> +# Check the case that prepared transactions exist on the subscriber node
> +#
> +# If the two_phase is altering from "on" to "off" and there are prepared
> +# transactions on the subscriber, they must be aborted. This test checks it.
>
>
> /off/false/
>
> /on/true/

Fixed.

>
> ~~~
>
> 3.4.
> +# Verify the prepared transaction has been replicated to the subscriber because
> +# two_phase is set to "on".
>
> /on/true/

Fixed.

>
> ~~~
>
> 3.5.
> +# Toggle the two_phase to "off" before the COMMIT PREPARED
> +$node_subscriber->safe_psql(
> + 'postgres', "
> + ALTER SUBSCRIPTION regress_sub DISABLE;
> + ALTER SUBSCRIPTION regress_sub SET (two_phase = off);
> + ALTER SUBSCRIPTION regress_sub ENABLE;");
>
> /off/false/
>
> /two_phase = off/two_phase = false/

Fixed.

>
> ~~~
>
> 3.6.
> +# Verify any prepared transactions are aborted because two_phase is changed
> to
> +# "off".
>
> /off/false/

Fixed.

>
> //////////
> patch v10-0004
> //////////
>
> ======
> 4.g1. GENERAL - document rendering fails
>
> FYI - The document failed to build after I apply patch 0003. Did you try it?
>
> In my environment it reported some unbalanced tags:
>
> ref/create_subscription.sgml:448: parser error : Opening and ending
> tag mismatch: link line 436 and para
> </para>
> ^
> ref/create_subscription.sgml:449: parser error : Opening and ending
> tag mismatch: para line 435 and listitem
> </listitem>
>
> etc.

Oh, I forgot to run `make check`. Sorry. It seemed that I missed to close <link> tag.

>
> ======
> doc/src/sgml/ref/alter_subscription.sgml
>
> 4.1.
> <para>
> The <literal>two_phase</literal> parameter can only be altered when
> the
> - subscription is disabled. When altering the parameter from
> <literal>true</literal>
> - to <literal>false</literal>, the backend process checks for any
> incomplete
> - prepared transactions done by the logical replication worker (from when
> - <literal>two_phase</literal> parameter was still
> <literal>true</literal>)
> - and, if any are found, those are aborted.
> + subscription is disabled. Altering the parameter from
> <literal>true</literal>
> + to <literal>false</literal> will give an error when when there are
> + prepared transactions done by the logical replication worker. If you want
> + to alter the parameter forcibly in this case,
> + <link
> linkend="sql-createsubscription-params-with-force-alter"><literal>force_alter
> </literal></link>
> + option must be set to <literal>true</literal> at the same time.
> </para>
>
> TYPO: "when when"

Removed.

> Why is necessary to say "at the same time"?

Not needed. Fixed.

>
> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 4.2.
> + <varlistentry id="sql-createsubscription-params-with-force-alter">
> + <term><literal>force_alter</literal>
> (<type>boolean</type>)</term>
> + <listitem>
> + <para>
> + Specifies if the <link
> linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION</command>
> + can be forced to proceed instead of giving an error. There is
> + currently only one scenario where this parameter has any effect:
> When
> + altering <literal>two_phase</literal> option from
> <literal>true</literal>
> + to <literal>false</literal> it is possible for there to be incomplete
> + prepared transactions done by the logical replication worker (from
> + when <literal>two_phase</literal> parameter was still
> <literal>true</literal>).
> + If <literal>force_alter</literal> is <literal>false</literal>, then
> + this will give an error; if <literal>force_alter</literal> is
> + <literal>true</literal>, then the incomplete prepared transactions
> + are aborted and the alter will proceed.
> + The default is <literal>false</literal>.
> + </para>
> + </listitem>
> + </varlistentry>
>
> IMO this will be better broken into multiple paragraphs.
>
> 1. Specifies...
> 2. There is...
> 3. The default is...

Separated.

>
> ======
> src/test/subscription/t/099_twophase_added.pl
>
> (Let's change all the on|off to true|false like you already did in patch 0002.
>
> 4.3.
> +# Try altering the two_phase option to "off." The command will fail since there
> +# is a prepared transaction and the 'force_alter' option is not specified as
> +# true.
> +my $stdout;
> +my $stderr;
>
> /off./false/

Fixed.

[1]: https://www.postgresql.org/message-id/CAHut%2BPs-RqrggaJU5w85BbeQzw9CLmmLgADVJoJ%3Dxx_4D5CWvw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
v11-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIP.patch application/octet-stream 25.5 KB
v11-0002-Alter-slot-option-two_phase-only-when-altering-t.patch application/octet-stream 12.3 KB
v11-0003-Abort-prepared-transactions-while-altering-two_p.patch application/octet-stream 10.2 KB
v11-0004-Add-force_alter-option-for-ALTER-SUBSCRIPTION-.-.patch application/octet-stream 56.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-05-16 05:04:51 RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message jian he 2024-05-16 04:14:56 Re: </replaceable> in parentesis is not usual on DOCs