From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: What is happening on buildfarm member crake? |
Date: | 2014-01-31 23:15:49 |
Message-ID: | 5346.1391210149@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible,
>> then this is really just one instance of a bug that's probably pretty
>> widespread in our signal handlers. Somebody needs to go through 'em
>> all and look for touches of shared memory.
> I haven't made a comprehensive study of every signal handler we have,
> but looking at procsignal_sigusr1_handler, the list of functions that
> can get called from here is quite short: CheckProcSignal(),
> RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler().
You forgot HandleCatchupInterrupt and HandleNotifyInterrupt, but those
are also safe because they do nothing once proc_exit_inprogress is set.
> I think maybe the best fix is to *clear* MyProc in
> ProcKill/AuxiliaryProcKill and MyProcSignalSlot in
> CleanupProcSignalState, as shown in the attached patch.
This looks good to me in principle. A couple minor beefs:
* The addition to CleanupProcSignalState could use a comment,
similar to the one you added in ProcKill.
* I think the code in ProcKill and AuxiliaryProcKill might be more
readable if the new local variable was named "myproc" (lower case).
> and we can easily add a NULL guard to the SetLatch() call in
> procsignal_sigusr1_handler, which the attached patch also does.
Um ... no such change actually visible in patch, but it's clearly
necessary.
> This might not be a complete fix to every problem of this type that
> exists anywhere in our code, but I think it's enough to make the world
> safe for procsignal_sigusr1_handler.
Yeah; at the least this should cut down on the buildfarm noise we
are seeing ATM.
> Assuming nobody objects too much to this basic approach, should I
> back-patch the parts of this that apply pre-9.4?
Yes, I think so. We have seen some reports of irreproducible crashes
at process exit, and maybe this explains them.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2014-01-31 23:32:03 | Re: LDAP: bugfix and deprecated OpenLDAP API |
Previous Message | Tom Lane | 2014-01-31 23:01:38 | Re: WITH ORDINALITY planner improvements |