From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: SIGQUIT handling, redux |
Date: | 2020-09-10 16:56:55 |
Message-ID: | 251830.1599757015@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Sep 9, 2020 at 10:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> bgworker_die (SIGTERM)
>> Calls ereport(FATAL). This is surely not any safer than, say,
>> quickdie(). No, it's worse, because at least that won't try
>> to go out via proc_exit().
> I think bgworker_die() is pretty much a terrible idea.
Yeah. It'd likely be better to insist that bgworkers handle SIGTERM
the same way backends do, via CHECK_FOR_INTERRUPTS. Not sure how big
a change that would be.
> I think that the only way this could actually
> be safe is if you have a background worker that never uses ereport()
> itself, so that the ereport() in the signal handler can't be
> interrupting one that's already happening.
That doesn't begin to make it safe, because quite aside from anything
that happens in elog.c, we're then going to go out via proc_exit()
which will invoke who knows what.
>> StandbyDeadLockHandler (from SIGALRM)
>> StandbyTimeoutHandler (ditto)
>> Calls CancelDBBackends, which just for starters tries to acquire
>> an LWLock. I think the only reason we've gotten away with this
>> for this long is the high probability that by the time either
>> timeout fires, we're going to be blocked on a semaphore.
> Yeah, I'm not sure these are so bad. In fact, in the deadlock case, I
> believe the old coding was designed to make sure we *had to* be
> blocked on a semaphore, but I'm not sure whether that's still true.
Looking at this closer, the only code that could get interrupted
by these timeouts is a call to ProcWaitForSignal, which is
void
ProcWaitForSignal(uint32 wait_event_info)
{
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
wait_event_info);
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
}
Interrupting the latch operations might be safe enough,
although now I'm casting a side-eye at Munro's recent
changes to share WaitEvent state all over the place.
I wonder whether blocking on an LWLock inside the
signal handler will break an interrupted WaitLatch.
Also, man that CHECK_FOR_INTERRUPTS() looks like trouble.
Could we take that out?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2020-09-10 17:08:30 | Re: PostgreSQL 13 Release Timeline |
Previous Message | Peter Eisentraut | 2020-09-10 16:53:14 | Re: Collation versioning |