From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: What is happening on buildfarm member crake? |
Date: | 2014-01-25 22:34:58 |
Message-ID: | 52E43C12.3020100@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/25/2014 05:04 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 01/19/2014 08:22 PM, Robert Haas wrote:
>>> Hmm, that looks an awful lot like the SIGUSR1 signal handler is
>>> getting called after we've already completed shmem_exit. And indeed
>>> that seems like the sort of thing that would result in dying horribly
>>> in just this way. The obvious fix seems to be to check
>>> proc_exit_inprogress before doing anything that might touch shared
>>> memory, but there are a lot of other SIGUSR1 handlers that don't do
>>> that either. However, in those cases, the likely cause of a SIGUSR1
>>> would be a sinval catchup interrupt or a recovery conflict, which
>>> aren't likely to be so far delayed that they arrive after we've
>>> already disconnected from shared memory. But the dynamic background
>>> workers stuff adds a new possible cause of SIGUSR1: the postmaster
>>> letting us know that a child has started or died. And that could
>>> happen even after we've detached shared memory.
>> Is anything happening about this? We're still getting quite a few of
>> these:
>> <http://www.pgbuildfarm.org/cgi-bin/show_failures.pl?max_days=3&member=crake>
> Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible,
> then this is really just one instance of a bug that's probably pretty
> widespread in our signal handlers. Somebody needs to go through 'em
> all and look for touches of shared memory.
>
> I'm not sure if we can just disable signal response the moment the
> proc_exit_inprogress flag goes up, though. In some cases such as lock
> handling, it's likely that we need that functionality to keep working
> for some part of the shutdown process. We might end up having to disable
> individual signal handlers at appropriate places.
>
> Ick.
>
>
Yeah. Since we're now providing for user-defined backends, maybe we need
some mechanism for white-listing handlers that can run in whole or part
during shutdown.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2014-01-25 23:08:32 | Re: A better way than tweaking NTUP_PER_BUCKET |
Previous Message | Stephen Frost | 2014-01-25 22:33:46 | Re: A better way than tweaking NTUP_PER_BUCKET |