Re: "stuck spinlock"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "stuck spinlock"
Date: 2013-12-13 01:45:17
Message-ID: 21730.1386899117@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christophe Pettus <xof(at)thebuild(dot)com> writes:
> On Dec 12, 2013, at 4:23 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Could you install the -dbg package and regenerate?

> Here's another, same system, different crash:

Both of these look like absolutely run-of-the-mill buffer access attempts.
Presumably, we are seeing the victim rather than the perpetrator of
whatever is going wrong. Whoever is holding the spinlock is just
going down with the rest of the system ...

In a devel environment, I'd try using the postmaster's -T switch so that
it SIGSTOP's all the backends instead of SIGQUIT'ing them, and then I'd
run around and gdb all the other backends to try to see which one was
holding the spinlock and why. Unfortunately, that's probably not
practical in a production environment; it'd take too long to collect
the stack traces by hand. So I have no good ideas about how to debug
this, unless you can reproduce it on a devel box, or are willing to
run modified executables in production.

Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
most systems dump core files with process IDs embedded in the names.
What would be more useful today is an option to send SIGABRT, or some
other signal that would force core dumps. Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2013-12-13 02:12:27 Re: "stuck spinlock"
Previous Message Alvaro Herrera 2013-12-13 01:22:49 Re: pgsql: Fix a couple of bugs in MultiXactId freezing