From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Optionally automatically disable logical replication subscriptions on error |
Date: | 2022-03-02 09:38:47 |
Message-ID: | TYCPR01MB83734B2F88F1FD435A8EBD98ED039@TYCPR01MB8373.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> After more thoughts, should we do both AbortOutOfAnyTransaction() and error
> message handling while holding interrupts? That is,
>
> HOLD_INTERRUPTS();
> EmitErrorReport();
> FlushErrorState();
> AbortOutOfAny Transaction();
> RESUME_INTERRUPTS();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
>
> I think it's better that we do clean up first and then do other works such as
> sending the message to the stats collector and updating the catalog.
I agree. Fixed. Along with this change, I corrected the header comment of
DisableSubscriptionOnError, too.
> Here are some comments on v24 patch:
>
> + /* Look up our subscription in the catalogs */
> + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> +
> CStringGetDatum(MySubscription->name));
>
> s/catalogs/catalog/
>
> Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
Changed.
> ---
> + if (!HeapTupleIsValid(tup))
> + ereport(ERROR,
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("subscription \"%s\" does not
> exist",
> + MySubscription->name));
>
> I think we should use elog() here rather than ereport() since it's a
> should-not-happen error.
Fixed
> ---
> + /* Notify the subscription will be no longer valid */
>
> I'd suggest rephrasing it to like "Notify the subscription will be disabled". (the
> subscription is still valid actually, but just disabled).
Fixed. Also, I've made this sentence past one, because of the code place
change below.
> ---
> + /* Notify the subscription will be no longer valid */
> + ereport(LOG,
> + errmsg("logical replication subscription
> \"%s\" will be disabled due to an error",
> + MySubscription->name));
> +
>
> I think we can report the log at the end of this function rather than during the
> transaction.
Fixed. In this case, I needed to adjust the comment to indicate the processing
to disable the sub has *completed* as well.
> ---
> +my $cmd = qq(
> +CREATE TABLE tbl (i INT);
> +ALTER TABLE tbl REPLICA IDENTITY FULL;
> +CREATE INDEX tbl_idx ON tbl(i));
>
> I think we don't need to set REPLICA IDENTITY FULL to this table since there is
> notupdate/delete.
>
> Do we need tbl_idx?
Removed both the replica identity and tbl_idx;
> ---
> +$cmd = qq(
> +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> +sr.srsubstate IN ('s', 'r'));
> +$node_subscriber->poll_query_until('postgres', $cmd);
>
> It seems better to add a condition of srrelid just in case.
Makes sense. Fixed.
> ---
> +$cmd = qq(
> +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> s.subname =
> +'sub' AND s.subenabled IS FALSE);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> + or die "Timed out while waiting for subscriber to be disabled";
>
> I think that it's more natural to directly check the subscription's subenabled.
> For example:
>
> SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
Fixed. I modified another code similar to this for tablesync error as well.
> ---
> +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> COUNT(1)
> += 3 FROM tbl WHERE i = 3);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> + or die "Timed out while waiting for applying";
>
> I think it's better to wait for the subscriber to catch up and check the query
> result instead of using poll_query_until() so that we can check the query result
> in case where the test fails.
I modified the code to wait for the subscriber and deleted poll_query_until.
Also, when I consider the test failure for this test as you mentioned,
expecting and checking the number of return value that equals 3
would be better. So, I expressed this point in my test as well,
according to your advice.
> ---
> +$cmd = qq(DROP INDEX tbl_unique);
> +$node_subscriber->safe_psql('postgres', $cmd);
>
> In the newly added tap tests, all queries are consistently assigned to $cmd and
> executed even when the query is used only once. It seems a different style from
> the one in other tap tests. Is there any reason why we do this style for all queries
> in the tap tests?
Fixed. I removed the 'cmd' variable itself.
Attached an updated patch v26.
Best Regards,
Takamichi Osumi
Attachment | Content-Type | Size |
---|---|---|
v26-0001-Optionally-disable-subscriptions-on-error.patch | application/octet-stream | 46.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2022-03-02 09:40:18 | Re: Add id's to various elements in protocol.sgml |
Previous Message | Peter Eisentraut | 2022-03-02 09:37:36 | Re: Add id's to various elements in protocol.sgml |