Re: Function to promote standby servers

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function to promote standby servers
Date: 2018-10-19 00:08:10
Message-ID: 20181019000810.GC2099@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 15, 2018 at 04:16:02PM +0200, Laurenz Albe wrote:
> Michael Paquier wrote:
>> Let's remove this restriction at code level, and instead use
>> function-level ACLs so as it is possible to allow non-superusers to
>> trigger a promotion as well, say a dedicated role. We could add a
>> system role for this purpose, but I am not sure that it is worth the
>> addition yet.
>
> Agreed, done.

>> As of now, this function returns true if promotion has been triggered,
>> and false if not. However it seems to me that we should have something
>> more consistent with "pg_ctl promote", so there are actually three
>> possible states:
>> 1) Node is in recovery, with promotion triggered. pg_promote() returns
>> true in case of success in creating the trigger file.
>> 2) Node is in recovery, with promotion triggered. pg_promote() returns
>> false in case of failure in creating the trigger file.
>> 3) Node is not in recovery, where I think a call to pg_promote should
>> trigger an error. This is not handled in the patch.
>
> So far I had returned "false" in the last case, but I am fine with
> throwing an error in that case. Done.

Thanks, that looks correct to me. I think that consistency with pg_ctl
is quite important, as this is a feature present for many releases now.

>> At the end this patch needs a bit more work.
>
> Yes, it did. Thanks for the thorough review!

+ /* wait for up to a minute for promotion */
+ for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i)
+ {
+ if (!RecoveryInProgress())
+ PG_RETURN_BOOL(true);
+
+ pg_usleep(1000000L / WAITS_PER_SECOND);
+ }
I would recommend to avoid pg_usleep and instead use a WaitLatch() or
similar to generate a wait event. The wait can then also be seen in
pg_stat_activity, which is useful for monitoring purposes. Using
RecoveryInProgress is indeed doable, and that's more simple than what I
thought first.

Something I missed to mention in the previous review: the timeout should
be manually enforceable, with a default at 60s.

Is the function marked as strict? Per the code it should be, I am not
able to test now though.

You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in
system_views.sql, or any users could trigger a promotion, no?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-19 00:13:41 Re: lowering pg_regress privileges on Windows
Previous Message Michael Paquier 2018-10-18 23:56:20 Re: Function to promote standby servers