Re: Immediate standby promotion

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-24 20:36:50
Message-ID: CA+U5nMKOsigzvYbe7f7k0k-15Z5RMj3DRhMSko1hNVQBWWGomA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 September 2014 01:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> "fast" promotion was actually a supported option in r8 of Postgres but
>> this option was removed when we implemented streaming replication in
>> r9.0
>>
>> The *rough* requirement is sane, but that's not the same thing as
>> saying this exact patch makes sense.
>
> Granted. Fair point.
>
>> If you are paused and you can see that WAL up ahead is damaged, then
>> YES, you do want to avoid applying it. That is possible by setting a
>> PITR target so that recovery stops at a precise location specified by
>> you. As an existing option is it better than the blunt force trauma
>> suggested here.
>
> You can pause at a recovery target, but then what if you want to go
> read/write at that point? Or what if you've got a time-delayed
> standby and you want to break replication so that it doesn't replay
> the DROP TABLE students that somebody ran on the master? It doesn't
> have to be that WAL is unreadable or corrupt; it's enough for it to
> contain changes you wish to avoid replaying.
>
>> If you really don't care, just shutdown server, resetxlog and start
>> her up - again, no need for new option.
>
> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> seems like a friendlier interface than making somebody shut down the
> server, run pg_resetxlog, and start it up again.

It makes sense to go from paused --> promoted.

It doesn't make sense to go from normal running --> promoted, since
that is just random data loss. I very much understand the case where
somebody is shouting "get the web site up, we are losing business".
Implementing a feature that allows people to do exactly what they
asked (go live now), but loses business transactions that we thought
had been safely recorded is not good. It implements only the exact
request, not its actual intention.

Any feature that lumps both cases together is wrongly designed and
will cause data loss.

We go to a lot of trouble to ensure data is successfully on disk and
in WAL. I won't give that up, nor do I want to make it easier to lose
data than it already is.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-09-24 20:39:19 Re: missing isinf declaration on solaris
Previous Message Tom Lane 2014-09-24 20:36:35 Re: identify_locking_dependencies is broken for schema-only dumps