Re: Optionally automatically disable logical replication subscriptions on error

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2021-12-13 09:57:17
Message-ID: CALDaNm1O8iBHSjfkv2-Ld6B7C4ay8F3-O1Q2QHVrM8H7Zu40iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 6, 2021 at 4:22 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Monday, December 6, 2021 1:16 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > On Sat, Dec 4, 2021 at 12:20 AM osumi(dot)takamichi(at)fujitsu(dot)com
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > >
> > > Hi, I've made a new patch v11 that incorporated suggestions described
> > above.
> > >
> >
> > Some review comments for the v11 patch:
> Thank you for your reviews !
>
> > doc/src/sgml/ref/create_subscription.sgml
> > (1) Possible wording improvement?
> >
> > BEFORE:
> > + Specifies whether the subscription should be automatically disabled
> > + if replicating data from the publisher triggers errors. The default
> > + is <literal>false</literal>.
> > AFTER:
> > + Specifies whether the subscription should be automatically disabled
> > + if any errors are detected by subscription workers during data
> > + replication from the publisher. The default is <literal>false</literal>.
> Fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) WorkerErrorRecovery comments
> > Instead of:
> >
> > + * As a preparation for disabling the subscription, emit the error,
> > + * handle the transaction and clean up the memory context of
> > + * error. ErrorContext is reset by FlushErrorState.
> >
> > why not just say:
> >
> > + Worker error recovery processing, in preparation for disabling the
> > + subscription.
> >
> > And then comment the function's code lines:
> >
> > e.g.
> >
> > /* Emit the error */
> > ...
> > /* Abort any active transaction */
> > ...
> > /* Reset the ErrorContext */
> > ...
> Agreed. Fixed.
>
> > (3) DisableSubscriptionOnError return
> >
> > The "if (!subform->subdisableonerr)" block should probably first:
> > heap_freetuple(tup);
> >
> > (regardless of the fact the only current caller will proc_exit anyway)
> Fixed.
>
> > (4) did_error flag
> >
> > I think perhaps the previously-used flag name "disable_subscription"
> > is better, or maybe "error_recovery_done".
> > Also, I think it would look better if it was set AFTER
> > WorkerErrorRecovery() was called.
> Adopted error_recovery_done
> and changed its places accordingly.
>
> > (5) DisableSubscriptionOnError LOG message
> >
> > This version of the patch removes the LOG message:
> >
> > + ereport(LOG,
> > + errmsg("logical replication subscription \"%s\" will be disabled due
> > to error: %s",
> > + MySubscription->name, edata->message));
> >
> > Perhaps a similar error message could be logged prior to EmitErrorReport()?
> >
> > e.g.
> > "logical replication subscription \"%s\" will be disabled due to an error"
> Added.
>
> I've attached the new version v12.

Thanks for the updated patch, few comments:
1) This is not required as it is not used in the caller.
+++ b/src/backend/replication/logical/launcher.c
@@ -132,6 +132,7 @@ get_subscription_list(void)
sub->dbid = subform->subdbid;
sub->owner = subform->subowner;
sub->enabled = subform->subenabled;
+ sub->disableonerr = subform->subdisableonerr;
sub->name = pstrdup(NameStr(subform->subname));
/* We don't fill fields we are not interested in. */

2) Should this be changed:
+ <structfield>subdisableonerr</structfield> <type>bool</type>
+ </para>
+ <para>
+ If true, the subscription will be disabled when subscription
+ worker detects any errors
+ </para></entry>
+ </row>
To:
+ <structfield>subdisableonerr</structfield> <type>bool</type>
+ </para>
+ <para>
+ If true, the subscription will be disabled when subscription's
+ worker detects any errors
+ </para></entry>
+ </row>

3) The last line can be slightly adjusted to keep within 80 chars:
+ Specifies whether the subscription should be automatically disabled
+ if any errors are detected by subscription workers during data
+ replication from the publisher. The default is
<literal>false</literal>.

4) Similarly this too can be handled:
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
- substream, subtwophasestate, subslotname,
subsynccommit, subpublications)
+ substream, subtwophasestate, subdisableonerr,
subslotname, subsynccommit, subpublications)
ON pg_subscription TO public;

5) Since disabling subscription code is common in and else, can we
move it below:
+ if (MySubscription->disableonerr)
+ {
+ WorkerErrorRecovery();
+ error_recovery_done = true;
+ }
+ else
+ {
+ /*
+ * Some work in error recovery work is
done. Switch to the old
+ * memory context and rethrow.
+ */
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ }
+ else
+ {
+ /*
+ * Don't miss any error, even when it's not
reported to stats
+ * collector.
+ */
+ if (MySubscription->disableonerr)
+ {
+ WorkerErrorRecovery();
+ error_recovery_done = true;
+ }
+ else
+ /* Simply rethrow because of no recovery work */
+ PG_RE_THROW();
+ }

6) Can we move LockSharedObject below the if condition.
+ subform = (Form_pg_subscription) GETSTRUCT(tup);
+ LockSharedObject(SubscriptionRelationId, subform->oid, 0,
AccessExclusiveLock);
+
+ /*
+ * We would not be here unless this subscription's
disableonerr field was
+ * true when our worker began applying changes, but check whether that
+ * field has changed in the interim.
+ */
+ if (!subform->subdisableonerr)
+ {
+ /*
+ * Disabling the subscription has been done already. No need of
+ * additional work.
+ */
+ heap_freetuple(tup);
+ table_close(rel, RowExclusiveLock);
+ CommitTransactionCommand();
+ return;
+ }
+

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-12-13 10:16:55 Re: Assertion failure with replication origins and PREPARE TRANSACTIOn
Previous Message Peter Smith 2021-12-13 09:48:47 Re: row filtering for logical replication