From: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: allow specifying action when standby encounters incompatible parameter settings |
Date: | 2022-06-24 10:42:29 |
Message-ID: | CANbhV-EWDNj6Wamu7fV=URv62_F-5+PtmxX22WDAj0rPxiPeNw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 23 Jun 2022 at 18:45, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> Thanks for chiming in.
>
> On Thu, Jun 23, 2022 at 04:19:45PM +0100, Simon Riggs wrote:
> > I don't understand why you need this patch at all.
> >
> > Since you have automation, you can use that layer to automatically
> > restart all standbys at once, if you choose, without any help or
> > hindrance from PostgreSQL.
> >
> > But I really don't want you to do that, since that results in loss of
> > availability of the service. I'd like you to try a little harder and
> > automate this in a way that allows the service to continue with some
> > standbys available while others restart.
> >
> > All this feature does is allow you to implement things in a lazy way
> > that causes a loss of availability for users. I don't think that is of
> > benefit to PostgreSQL users, so -1 from me, on this patch (only),
> > sorry about that.
>
> Overall, this is intended for users that care more about keeping WAL replay
> caught up than a temporary loss of availability due to a restart. Without
> this, I'd need to detect that WAL replay has paused due to insufficient
> parameters and restart Postgres. If І can configure Postgres to
> automatically shut down in these scenarios, my automation can skip right to
> adjusting the parameters and starting Postgres up. Of course, if you care
> more about availability, you'd keep this parameter set to the default
> (pause) and restart on your own schedule.
There are a few choices of how we can deal with this situation
1. Make the change blindly and then pick up the pieces afterwards
2. Check the configuration before changes are made, and make the
changes in the right order
This patch and the above argument assumes that you must do (1), but
you could easily do (2).
i.e. If you know that changing specific parameters might affect
availability, why not query those parameter values on all servers
first and check whether the change will affect availability, before
you allow the changes? why rely on PostgreSQL to pick up the pieces
because the orchestration code doesn't (yet) make configuration sanity
checks?
This patch would undo a very important change - to keep servers
available by default and go back to the old behavior for a huge fleet
of Postgres databases. The old behavior of shutdown-on-change caused
catastrophe many times for users and in one case brought down a rather
large and important service provider, whose CTO explained to me quite
clearly how stupid he thought that old behavior was. So I will not
easily agree to allowing it to be put back into PostgreSQL, simply to
avoid adding a small amount of easy code into an orchestration layer
that could and should implement documented best practice.
I am otherwise very appreciative of your insightful contributions,
just not this specific one.
Happy to discuss how we might introduce new parameters/behavior to
reduce the list of parameters that need to be kept in lock-step.
--
Simon Riggs http://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2022-06-24 10:45:22 | Re: O(n) tasks cause lengthy startups and checkpoints |
Previous Message | Amit Kapila | 2022-06-24 10:19:24 | Re: Support logical replication of DDLs |