From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Christophe Pettus <xof(at)thebuild(dot)com>, Postgresql General <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: startup process stuck in recovery |
Date: | 2017-10-11 06:54:32 |
Message-ID: | CANP8+jJhvwhngTaoT1yEmi1YD-uPDEsr=vbHz_yM+=y-NZgn=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 10 October 2017 at 21:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> What I see is that, given this particular test case, the backend
> process on the master never holds more than a few locks at a time.
> Each time we abort a subtransaction, the AE lock it was holding
> on the temp table it created gets dropped. However ... on the
> standby server, pre v10, the replay process attempts to take all
> 12000 of those AE locks at once. This is not a great plan.
Standby doesn't take locks "at once", they are added just as they
arrive. The locks are held by topxid, so not released at subxid abort,
by design, so they are held concurrently.
> v10 and HEAD avoid the problem because the standby server doesn't
> take locks (any at all, AFAICS). I suppose this must be a
> consequence of commit 9b013dc238c, though I'm not sure exactly how.
Locks are still taken, but in 9b013dc238c we just avoid trying to
release locks when transactions don't have any.
> Anyway, it's pretty scary that it's so easy to run the replay process
> out of shared memory pre-v10. I wonder if we should consider
> backpatching that fix. Any situation where the replay process takes
> more locks concurrently than were ever held on the master is surely
> very bad news.
v10 improves on this specific point because we perform lock release at
subxid abort.
Various cases have been reported over time and this has been improving
steadily in each release.
It isn't "easy" to run the replay process out of memory because
clearly that doesn't happen much, but yes there are some pessimal use
cases that don't work well. The use case described seems incredibly
unreal and certainly amenable to being rewritten.
Backpatching some of those fixes is quite risky, IMHO.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Christophe Pettus | 2017-10-11 07:09:23 | Re: startup process stuck in recovery |
Previous Message | Scott Mead | 2017-10-11 02:10:07 | Re: Can master and slave on different PG versions? |