From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: What is happening on buildfarm member crake? |
Date: | 2014-01-27 14:14:27 |
Message-ID: | CA+TgmobSrVXfRAf2nA-Uik80ExCgZRSWzxv+RY67o-_JYRYoiw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible,
> then this is really just one instance of a bug that's probably pretty
> widespread in our signal handlers. Somebody needs to go through 'em
> all and look for touches of shared memory.
I haven't made a comprehensive study of every signal handler we have,
but looking at procsignal_sigusr1_handler, the list of functions that
can get called from here is quite short: CheckProcSignal(),
RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler().
Taking those in reverse order:
- latch_sigusr1_handler() is fine. Nothing down this path touches
shared memory; moreover, if we've already disowned our latch, the
waiting flag won't be set and this will do nothing at all.
- The call to SetLatch() is problematic as we already know. This is
new code in 9.4.
- RecoveryConflictInterrupt() does nothing if proc_exit_inprogress is
set. So it's fine.
- CheckProcSignal() also appears problematic. If we've already
detached shared memory, MyProcSignalSlot will be pointing to garbage,
but we'll try to dereference it anyway.
I think maybe the best fix is to *clear* MyProc in
ProcKill/AuxiliaryProcKill and MyProcSignalSlot in
CleanupProcSignalState, as shown in the attached patch. Most places
that dereference those pointers already check that they aren't null,
and we can easily add a NULL guard to the SetLatch() call in
procsignal_sigusr1_handler, which the attached patch also does.
This might not be a complete fix to every problem of this type that
exists anywhere in our code, but I think it's enough to make the world
safe for procsignal_sigusr1_handler. We also have a *large* number of
signal handlers that do little more than this:
if (MyProc)
SetLatch(&MyProc->procLatch);
...and this change should make all of those safe as well. So I think
this is a pretty good start.
Assuming nobody objects too much to this basic approach, should I
back-patch the parts of this that apply pre-9.4? The problem with
CleanupProcSignalState, at least, goes all the way back to 9.0, when
the signal-multiplexing infrastructure was introduced. But the
probability of an actual crash must be pretty low, or I imagine we
would have noticed this sooner.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
deinit.patch | text/x-patch | 3.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2014-01-27 14:18:49 | Re: Retain dynamic shared memory segments for postmaster lifetime |
Previous Message | Mitsumasa KONDO | 2014-01-27 13:48:16 | Re: Add min and max execute statement time in pg_stat_statement |