Re: 9.2.3 crashes during archive recovery

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2.3 crashes during archive recovery
Date: 2013-02-14 17:18:41
Message-ID: CAHGQGwHHci4daMLxJqoxgcJxyo8ZeH3hmQ3kJHaB2r5FCPaUSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 13.02.2013 17:02, Tom Lane wrote:
>>
>> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> writes:
>>>
>>> At least in back-branches, I'd call this a pilot error. You can't turn a
>>> master into a standby just by creating a recovery.conf file. At least
>>> not if the master was not shut down cleanly first.
>>> ...
>>> I'm not sure that's worth the trouble, though. Perhaps it would be
>>> better to just throw an error if the control file state is
>>> DB_IN_PRODUCTION and a recovery.conf file exists.
>>
>>
>> +1 for that approach, at least until it's clear there's a market for
>> doing this sort of thing. I think the error check could be
>> back-patched, too.
>
>
> Hmm, I just realized a little problem with that approach. If you take a base
> backup using an atomic filesystem backup from a running server, and start
> archive recovery from that, that's essentially the same thing as Kyotaro's
> test case.

Yes. And the resource agent for streaming replication in Pacemaker (it's the
OSS clusterware) is the user of that archive recovery scenario, too. When it
starts up the server, it always creates the recovery.conf and starts the server
as the standby. It cannot start the master directly, IOW the server is always
promoted to the master from the standby. So when it starts up the server
after the server crashes, obviously it executes the same recovery scenario
(i.e., force archive recovery instead of crash one) as Kyotaro described.

The reason why that resource agent cannot start up the master directly is
that it manages three server states, called Master, Slave and Down. It can
move the server state from Down to Slave, and the reverse direction.
Also it can move the state from Slave to Master, and the reverse direction.
But there is no way to move the state between Down and Master directly.
This kind of the state transition model is isolated case in
clusterware, I think.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-02-14 17:27:22 Re: 9.2.3 crashes during archive recovery
Previous Message Jonathan Rogers 2013-02-14 17:18:08 Re: [RFC] ideas for a new Python DBAPI driver (was Re: libpq test suite)