Re: Missing pg_control crashes postmaster

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org,David Steele <david(at)pgmasters(dot)net>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Brian Faherty <anothergenericuser(at)gmail(dot)com>,"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Missing pg_control crashes postmaster
Date: 2018-07-25 14:37:31
Message-ID: 55F8476D-DC2A-4BA9-8A34-D2605F558910@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On July 25, 2018 7:18:30 AM PDT, David Steele <david(at)pgmasters(dot)net> wrote:
>On 7/23/18 7:00 PM, Tom Lane wrote:
>> Brian Faherty <anothergenericuser(at)gmail(dot)com> writes:
> >
>>> There does not really seem to be a need for this behavior as all the
>>> information postgres needs is in memory at this point. I propose
>with
>>> a patch to just recreate pg_control on updates if it does not exist.
>>
>> I would vote to reject any such patch; it's too likely to cause more
>> problems than it solves. Generally, if critical files like that one
>> have disappeared, trying to write new data isn't going to be enough
>> to fix it and could well result in more corruption.
>>
>> As an example, imagine that you do "rm -rf $PGDATA; initdb" without
>> remembering to shut down the old postmaster first. Currently, the
>> old postmaster will panic/quit fairly promptly and no harm done.
>> The more aggressive it is at trying to "recover" from the situation,
>> the more likely it is to corrupt the new installation.
>
>It seems much more likely that a missing/modified postmaster.pid will
>cause postgres to panic than it is for a missing pg_control to do so.
>
>Older versions of postgres don't panic until the next checkpoint and
>newer versions won't panic at all on an idle system since we fixed
>redundant checkpoints in 9.6 (6ef2eba3). An idle postgres 11 cluster
>seems happy enough to run without a pg_control file indefinitely (or at
>
>least 10 minutes, which is past the default checkpoint time). As soon
>as I write data or perform a checkpoint it does panic, of course.
>
>Conversely, removing/modifying postmaster.pid causes postgres to panic
>very quickly on the versions I tested, 9.4 and 11.
>
>It seems to me that doing the postmaster.pid test at checkpoint time
>(if
>we don't already) would be enough to protect pg_control against
>unintentionally replaced clusters.
>
>Or perhaps writing to an alternate file as David J suggests would do
>the
>trick.
>
>It seems like an easy win if we can find a safe way to do it, though I
>admit that this is only a benefit in corner cases.

What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexity for it's own sake.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-07-25 14:52:08 Re: Missing pg_control crashes postmaster
Previous Message David Steele 2018-07-25 14:18:30 Re: Missing pg_control crashes postmaster