Re: max_standby_delay considered harmful

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Florian Pflug <fgp(at)phlo(dot)org>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: max_standby_delay considered harmful
Date: 2010-05-12 16:04:20
Message-ID: AANLkTinRrbhlHd3f__lg0S8_2AbKWye1Q19Ff7T4udtx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 12, 2010 at 11:28 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote:
>> On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
>> > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
>> > >
>> > >> I'm not sure what to make of this.  Sometimes not shutting down
>> > >> doesn't sound like a feature to me.
>> > >
>> > > It acts exactly the same in recovery as in normal running. It is not a
>> > > special feature of recovery at all, bug or otherwise.
>> >
>> > Simon, that doesn't make any sense.  We are talking about a backend
>> > getting stuck forever on an exclusive lock that is held by the startup
>> > process and which will never be released (for example, because the
>> > master has shut down and no more WAL can be obtained for replay).  The
>> > startup process does not hold locks in normal operation.
>>
>> When I test it, startup process holding a lock does not prevent shutdown
>> of a standby.
>>
>> I'd be happy to see your test case showing a bug exists and that the
>> behaviour differs from normal running.
>
> Let me put this differently: I accept that Stefan has reported a
> problem. Neither Tom nor myself can reproduce the problem. I've re-run
> Stefan's test case and restarted the server more than 400 times now
> without any issue.

OK, I'm glad to hear you've been testing this. I wasn't aware of that.

> I re-read your post where you gave what you yourself called "uninformed
> speculation". There's no real polite way to say it, but yes your
> speculation does appear to be uninformed, since it is incorrect. Reasons
> would be not least that Stefan's tests don't actually send any locks to
> the standby anyway (!),

Hmm. Well, assuming you're correct, that does seem to be a, uh,
slight problem with my theory.

> but even if they did your speculation as to the
> cause is still all wrong, as explained.

You lost me. I don't understand why the problem that I'm referring to
couldn't happen, even if it's not what's happening here.

> There is no evidence to link this behaviour with HS, as yet, and you
> should be considering the possibility the problem lies elsewhere,
> especially since it could be code you committed that is at fault.

Huh?? The evidence that this bug is linked with HS is that it occurs
on a server running in HS mode, and not otherwise. As for whether the
bug is code I committed, that's certainly possible, but keep in mind
it didn't work at all before IN HOT STANDBY MODE - and that will be
code you committed.

I'm going to go test this and see if I can figure out what's going on.
I hope you will keep at it also - as you point out, your knowledge of
this code far exceeds mine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-05-12 16:24:22 Re: List traffic
Previous Message Tom Lane 2010-05-12 16:02:52 Re: hot update doesn't work?