Re: "stuck spinlock"

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "stuck spinlock"
Date: 2013-12-13 02:41:41
Message-ID: 20131213024141.GF29402@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-12-12 21:15:29 -0500, Tom Lane wrote:
> Christophe Pettus <xof(at)thebuild(dot)com> writes:
> > On Dec 12, 2013, at 5:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Presumably, we are seeing the victim rather than the perpetrator of
> >> whatever is going wrong.
>
> > This is probing about a bit blindly, but the only thing I can see about this system that is in some way unique (and this is happening on multiple machines, so it's unlikely to be hardware) is that there are a relatively large number of relations (like, 440,000+) distributed over many schemas. Is there anything that pins a buffer that is O(N) to the number of relations?
>
> It's not a buffer *pin* that's at issue, it's a buffer header spinlock.
> And there are no loops, of any sort, that are executed while holding
> such a spinlock. At least not in the core PG code. Are you possibly
> using any nonstandard extensions?

It could maybe be explained by a buffer aborting while performing
IO. Until it has call AbortBufferIO(), other backends will happily loop
in WaitIO(), constantly taking the the buffer header spinlock and
locking io_in_progress_lock in shared mode, thereby preventing
AbortBufferIO() from succeeding.

Christophe: are there any "unusual" ERROR messages preceding the crash,
possibly some minutes before?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message satoshi yamada 2013-12-13 02:44:46 Re: Why standby.max_connections must be higher than primary.max_connections?
Previous Message Peter Geoghegan 2013-12-13 02:25:28 Re: "stuck spinlock"