RE: Optionally automatically disable logical replication subscriptions on error

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Optionally automatically disable logical replication subscriptions on error
Date: 2022-03-01 05:40:41
Message-ID: TYCPR01MB83730138EF8B26278647E8D6ED029@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, March 1, 2022 9:49 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Please see below my review comments for v22.
>
> ======
>
> 1. Commit message
>
> "table sync worker" -> "tablesync worker"
Fixed.

> ~~~
>
> 2. doc/src/sgml/catalogs.sgml
>
> + <para>
> + If true, the subscription will be disabled when subscription
> + workers detect any errors
> + </para></entry>
>
> It felt a bit strange to say "subscription" 2x in the sentence, but I am not sure
> how to improve it. Maybe like below?
>
> BEFORE
> If true, the subscription will be disabled when subscription workers detect any
> errors
>
> SUGGESTED
> If true, the subscription will be disabled if one of its workers detects an error
Fixed.

> ~~~
>
> 3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
>
> @@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received) }
>
> /*
> + * Disable the current subscription, after error recovery processing.
> + */
> +static void
> +DisableSubscriptionOnError(void)
>
> I thought the "after error recovery processing" part was a bit generic and did not
> really say what it was doing.
>
> BEFORE
> Disable the current subscription, after error recovery processing.
> SUGGESTED
> Disable the current subscription, after logging the error that caused this
> function to be called.
Fixed.

> ~~~
>
> 4. src/backend/replication/logical/worker.c - start_apply
>
> + if (MySubscription->disableonerr)
> + {
> + DisableSubscriptionOnError();
> + return;
> + }
> +
> + MemoryContextSwitchTo(ecxt);
> + PG_RE_THROW();
> + }
> + PG_END_TRY();
>
> The current code looks correct, but I felt it is a bit tricky to easily see the
> execution path after the return.
>
> Since it will effectively just exit anyhow I think it will be simpler just to do that
> explicitly right here instead of the 'return'. This will also make the code
> consistent with the same 'disableonerr' logic of the start_start_sync.
>
> SUGGESTION
> if (MySubscription->disableonerr)
> {
> DisableSubscriptionOnError();
> proc_exit(0);
> }
Fixed.

> ~~~
>
> 5. src/bin/pg_dump/pg_dump.c
>
> @@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
> if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
> appendPQExpBufferStr(query, ", two_phase = on");
>
> + if (strcmp(subinfo->subdisableonerr, "f") != 0)
> + appendPQExpBufferStr(query, ", disable_on_error = true");
> +
>
> Although the code is correct, I think it would be more natural to set this option
> as true when the user wants it true. e.g. check for "t"
> same as 'subbinary' is doing. This way, even if there was some
> unknown/corrupted value the code would do nothing, which is the behaviour
> you want...
>
> SUGGESTION
> if (strcmp(subinfo->subdisableonerr, "t") == 0)
Fixed.

> ~~~
>
> 6. src/include/catalog/pg_subscription.h
>
> @@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>
> char subtwophasestate; /* Stream two-phase transactions */
>
> + bool subdisableonerr; /* True if occurrence of apply errors
> + * should disable the subscription */
>
> The comment seems not quite right because it's not just about apply errors. E.g.
> I think any error in tablesync will cause disablement too.
>
> BEFORE
> True if occurrence of apply errors should disable the subscription SUGGESTED
> True if a worker error should cause the subscription to be disabled
Fixed.

> ~~~
>
> 7. src/test/regress/sql/subscription.sql - whitespace
>
> +-- now it works
> +CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false,
> disable_on_error = false);
> +
> +\dRs+
> +
> +ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
> +
> +\dRs+
> +ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE); DROP
> +SUBSCRIPTION regress_testsub;
> +
>
> I think should be a blank line after that last \dRs+ just like the other one,
> because it belongs logically with the code above it, not with the ALTER
> slot_name.
Fixed.

> ~~~
>
> 8. src/test/subscription/t/028_disable_on_error.pl - filename
>
> The 028 number needs to be bumped because there is already a TAP test
> called 028 now
This is already done in v22, so I've skipped this.

> ~~~
>
> 9. src/test/subscription/t/028_disable_on_error.pl - missing test
>
> There was no test case for the last combination where the user correct the
> apply worker problem: E.g. After a previous error/disable of the subscriber,
> remove the index, publish the inserts again, and check they get applied
> properly.
Fixed.

Attached the updated version v24.

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
v24-0001-Optionally-disable-subscriptions-on-error.patch application/octet-stream 46.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-03-01 05:42:54 Re: Separate the result of \watch for each query execution (psql)
Previous Message Bharath Rupireddy 2022-03-01 05:40:09 Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)