Re: startup process stuck in recovery

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Postgresql General <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: startup process stuck in recovery
Date: 2017-10-11 06:54:32
Message-ID: CANP8+jJhvwhngTaoT1yEmi1YD-uPDEsr=vbHz_yM+=y-NZgn=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10 October 2017 at 21:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What I see is that, given this particular test case, the backend
> process on the master never holds more than a few locks at a time.
> Each time we abort a subtransaction, the AE lock it was holding
> on the temp table it created gets dropped. However ... on the
> standby server, pre v10, the replay process attempts to take all
> 12000 of those AE locks at once. This is not a great plan.

Standby doesn't take locks "at once", they are added just as they
arrive. The locks are held by topxid, so not released at subxid abort,
by design, so they are held concurrently.

> v10 and HEAD avoid the problem because the standby server doesn't
> take locks (any at all, AFAICS). I suppose this must be a
> consequence of commit 9b013dc238c, though I'm not sure exactly how.

Locks are still taken, but in 9b013dc238c we just avoid trying to
release locks when transactions don't have any.

> Anyway, it's pretty scary that it's so easy to run the replay process
> out of shared memory pre-v10. I wonder if we should consider
> backpatching that fix. Any situation where the replay process takes
> more locks concurrently than were ever held on the master is surely
> very bad news.

v10 improves on this specific point because we perform lock release at
subxid abort.

Various cases have been reported over time and this has been improving
steadily in each release.

It isn't "easy" to run the replay process out of memory because
clearly that doesn't happen much, but yes there are some pessimal use
cases that don't work well. The use case described seems incredibly
unreal and certainly amenable to being rewritten.

Backpatching some of those fixes is quite risky, IMHO.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christophe Pettus 2017-10-11 07:09:23 Re: startup process stuck in recovery
Previous Message Scott Mead 2017-10-11 02:10:07 Re: Can master and slave on different PG versions?