Re: Crash during backend start when low on memory

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, daniel(at)yesql(dot)se, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Crash during backend start when low on memory
Date: 2023-02-06 07:37:17
Message-ID: CA+14424g_23hypp=jcTs8P5iFyDuzSU9BJzqon+9+YgW7az1zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-02-05 22:00:20 +0100, Mats Kindahl wrote:
> > Tom had some concerns about using PG_TRY() inside the postmaster
> > ServerLoop
>
> I don't think we gain anything in most of the places by using PG_TRY
> instead of a non-throwing allocations. The relevant is cooperating
> closely enough that just returning returning a failure indicator is
> sufficient.
>
> Note that that doesn't mean that palloc can't be used, see
> MCXT_ALLOC_NO_OOM.
>

Ah, yes, then both patches can be rewritten to not use PG_TRY().

>
> >, but I cannot see how that can cause problems since it is "just"
> > a setjmp() and longjmp(). It allocates a structure on the stack that
> > contains the values of the registers and the signal set; it is big, but
> not
> > exceedingly so. If somebody could enlighten me about if there are any
> > problems with this, I would be very grateful.
>
> If you aren't careful you end up with PG_exception_stack in a backend
> still pointing to that PG_TRY. Which means that an error during backend
> startup would jump into postmaster code. Which would not be good.
>

I did run into this case when "failing" some of the memory allocations
inside the backend with a debugger (see the PATCH.md).

There are, however, already cases in the backend startup that can raise an
error (for example, the code that creates the MessageContext in
PostgresMain). So, either if one of these error cases is triggered, or if
you make a mistake and add an ereport(ERROR,...) at the wrong place during
startup, it could cause this problem. In other words, wouldn't adding a
PG_TRY() in the code spawning the backend *prevent* such a problem and
hence reduce the risk of a mistake in code causing the backend jumping into
postmaster code (of course assuming that the code is written correctly).

>
> Particularly because this is a bugfix we need to make the fix more
> minimal than the patches proposed so far.
>

I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it
feels like a safer path for incorporating a bug fix, but note that there is
still a risk that the backend will tear down the postmaster. For example,
allocating memory for a memory context can fail, or throwing another error
inside the backend startup can also fail, so I think that there will be
more work to do after this.

Best wishes,
Mats Kindahl

>
>
> Greetings,
>
> Andres Freund
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2023-02-06 08:25:42 Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Previous Message PG Bug reporting form 2023-02-06 06:43:18 BUG #17774: Assert triggered on brin_minmax_multi.c