From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Optionally automatically disable logical replication subscriptions on error |
Date: | 2021-12-06 04:37:44 |
Message-ID: | D7470973-CEF5-4564-B03D-65A60836EDD5@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Dec 1, 2021, at 8:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> The patch disables the subscription for non-transient errors. I am not
> sure if we can easily make the call to decide whether any particular
> error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> might not rectify itself. Why not just allow to disable the
> subscription on any error? And then let the user check the error
> either in view or logs and decide whether it would like to enable the
> subscription or do something before it (like making space in disk, or
> fixing the network).
The original idea of the patch, back when I first wrote and proposed it, was to remove the *absurdity* of retrying a transaction which, in the absence of human intervention, was guaranteed to simply fail again ad infinitum. Retrying in the face of resource errors is not *absurd* even though it might fail again ad infinitum. The reason is that there is at least a chance that the situation will clear up without human intervention.
> The other problem I see with this transient error stuff is maintaining
> the list of error codes that we think are transient. I think we need a
> discussion for each of the error_codes we are listing now and whatever
> new error_code we add in the future which doesn't seem like a good
> idea.
A reasonable rule might be: "the subscription will be disabled if the server can determine that retries cannot possibly succeed without human intervention." We shouldn't need to categorize all error codes perfectly, as long as we're conservative. What I propose is similar to how we determine whether to mark a function leakproof; we don't have to mark all leakproof functions as such, we just can't mark one as such if it is not.
If we're going to debate the error codes, I think we would start with an empty list, and add to the list on sufficient analysis.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Sadhuprasad Patro | 2021-12-06 04:44:43 | Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce) |
Previous Message | Bharath Rupireddy | 2021-12-06 04:35:20 | Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers |