Re: Async-unsafe functions in signal handlers

From: Andres Freund <andres(at)anarazel(dot)de>
To: Denis Smirnov <sd(at)arenadata(dot)io>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Async-unsafe functions in signal handlers
Date: 2021-08-27 21:05:47
Message-ID: 20210827210547.nntbtdlhqbxbo4wu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-08-27 23:51:27 +0300, Denis Smirnov wrote:
> > 26 авг. 2021 г., в 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> написал(а):
> >
> > (BTW, I think it's pretty silly to imagine that adding backtrace()
> > calls inside ereport is making things any more dangerous. ereport
> > has pretty much always carried a likelihood of calling malloc(),
> > for example.)
>
> I have taken a look through the signal handlers and found out that many of them use malloc() via ereport() and elog(). Here is the list:
>
> SIGUSR1
> - procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, checkpointer, pgarch, startup, walwriter, walreciever, walsender

There shouldn't be meaningful uses of elog/ereport() inside
procsignal_sigusr1_handler(). The exception I found was an elog(FATAL) for
unreachable code.

> - sigusr1_handler(): postmaster
> SIGHUP:
> - SIGHUP_handler(): postmaster
> SIGCHLD:
> - reaper(): postmaster

I think these runs in a very controlled set of circumstances because most of
postmaster runs with signals masked.

> SIGFPE:
> - FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl

Yep, although as discussed this might not be a "real" problem because it
should only run during an instruction triggering an FPE.

> SIGQUIT:
> - quickdie(): postgres

Yes, this is an issue. I've previously argued for handling this via write()
and _exit(), instead of the full ereport() machinery. However, we have a
bandaid that deals with possible hangs, by SIGKILLing when processes don't
shut down (at that point things have already gone quite south, so that's not
an issue).

> SIGTERM:
> - bgworker_die(): bgworker

Bad.

> SIGALRM:
> - handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, postgres

I don't think there reachable elogs in there. I'm not concerned about e.g.
elog(FATAL, "timeout index %d out of range 0..%d", index,
num_active_timeouts - 1);
because that's not something that should ever be reachable in a production
scenario. If it is, there's bigger problems.

Perhaps we ought to have a version of Assert() that's enabled in production
builds as well, and that outputs the error messages via write() and then
_exit()s?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-08-27 21:23:23 Re: Can we get rid of repeated queries from pg_dump?
Previous Message Denis Smirnov 2021-08-27 20:51:27 Re: Async-unsafe functions in signal handlers