From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Use of signal-unsafe functions from signal handlers |
Date: | 2022-05-24 12:10:36 |
Message-ID: | CA+14426Zpdi6=jr9Z+C_tzS36pVa_+BbCHSeriqd=sLw4X6_iw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> Hi,
>
> On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
> >
> > Typically, signal-unsafe functions should not be called from signal
> > handlers. In particular, calling malloc() directly or indirectly can
> cause
> > deadlocks, making PostgreSQL unresponsive to signals.
> >
> > Unless I am missing something, bgworker_die uses ereport, which
> indirectly
> > calls printf-like functions, which are not signal-safe since they use
> > malloc(). In rare cases, this can lead to deadlocks with stacks that look
> > like this (from https://github.com/timescale/timescaledb/issues/4200)
> >
> > #0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
> > target:/lib/x86_64-linux-gnu/libc.so.6
> > [...]
> > #3 malloc (size=53)
> > [...]
> > #7 0x000055b9212235b1 in errmsg ()
> > #8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> > /build/timescaledb/src/bgw/scheduler.c:841
> > #9 <signal handler called>
> > [...]
> > #13 free (ptr=<optimized out>)
> > #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> > target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > [...]
>
> As far as I can see the problem comes from your handle_sigterm:
>
> static void handle_sigterm(SIGNAL_ARGS)
> {
> /*
> * do not use a level >= ERROR because we don't want to exit here
> but
> * rather only during CHECK_FOR_INTERRUPTS
> */
> ereport(LOG,
> (errcode(ERRCODE_ADMIN_SHUTDOWN),
> errmsg("terminating TimescaleDB job scheduler due
> to administrator command")));
> die(postgres_signal_arg);
> }
>
> As mentioned in the topmost comment of elog.c, there's an escape hatch for
> out of memory situations, to make sure that a small enough message can be
> displayed without trying to allocate memory. But this is only for ERROR or
> higher levels. Using an ereport(LOG, ...) level in a sigterm handler
> indeed
> isn't safe.
>
Yes, and we have already fixed this in the TimescaleDB code, so this is not
an issue for us (https://github.com/timescale/timescaledb/pull/4323) (We
actually replace the bgworker_die with just die as signal handler.)
However, bgworker_die()---which is part of PostgreSQL and is the default
signal handler for background workers---is using ereport() and I think this
should be a problem for all error levels since the problem is in the usage
of the formatting function to format the error message, not where you write
it. This would mean that any existing background workers would have a
window for triggering this issue on shutdown.
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2022-05-24 12:37:39 | Re: Use of signal-unsafe functions from signal handlers |
Previous Message | PG Bug reporting form | 2022-05-24 11:04:18 | BUG #17494: High demand for displacement sort |