Re: Add the ability to limit the amount of memory that can be allocated to backends.

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com>
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date: 2023-10-19 22:06:10
Message-ID: ZTGoUt+pH8uPucfm@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andrei Lepikhov (a(dot)lepikhov(at)postgrespro(dot)ru) wrote:
> On 19/10/2023 02:00, Stephen Frost wrote:
> > * Andrei Lepikhov (a(dot)lepikhov(at)postgrespro(dot)ru) wrote:
> > > On 29/9/2023 09:52, Andrei Lepikhov wrote:
> > > > On 22/5/2023 22:59, reid(dot)thompson(at)crunchydata(dot)com wrote:
> > > > > Attach patches updated to master.
> > > > > Pulled from patch 2 back to patch 1 a change that was also pertinent
> > > > > to patch 1.
> > > > +1 to the idea, have doubts on the implementation.
> > > >
> > > > I have a question. I see the feature triggers ERROR on the exceeding of
> > > > the memory limit. The superior PG_CATCH() section will handle the error.
> > > > As I see, many such sections use memory allocations. What if some
> > > > routine, like the CopyErrorData(), exceeds the limit, too? In this case,
> > > > we could repeat the error until the top PG_CATCH(). Is this correct
> > > > behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
> > > > recursion and allow error handlers to slightly exceed this hard limit?
> >
> > > By the patch in attachment I try to show which sort of problems I'm worrying
> > > about. In some PП_CATCH() sections we do CopyErrorData (allocate some
> > > memory) before aborting the transaction. So, the allocation error can move
> > > us out of this section before aborting. We await for soft ERROR message but
> > > will face more hard consequences.
> >
> > While it's an interesting idea to consider making exceptions to the
> > limit, and perhaps we'll do that (or have some kind of 'reserve' for
> > such cases), this isn't really any different than today, is it? We
> > might have a malloc() failure in the main path, end up in PG_CATCH() and
> > then try to do a CopyErrorData() and have another malloc() failure.
> >
> > If we can rearrange the code to make this less likely to happen, by
> > doing a bit more work to free() resources used in the main path before
> > trying to do new allocations, then, sure, let's go ahead and do that,
> > but that's independent from this effort.
>
> I agree that rearranging efforts can be made independently. The code in the
> letter above was shown just as a demo of the case I'm worried about.
> IMO, the thing that should be implemented here is a recursion level for the
> memory limit. If processing the error, we fall into recursion with this
> limit - we should ignore it.
> I imagine custom extensions that use PG_CATCH() and allocate some data
> there. At least we can raise the level of error to FATAL.

Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed. These are all the same
issues that already exist today on systems which don't allow overcommit
too, there isn't anything new here in regards to these risks, so I'm not
really keen to complicate this to deal with issues that are already
there.

Perhaps once we've got the basics in place then we could consider
reserving some space for handling such cases.. but I don't think it'll
actually be very clean and what if we have an allocation that goes
beyond what that reserved space is anyway? Then we're in the same spot
again where we have the choice of either failing the allocation in a
less elegant way than we might like to handle that error, or risk
getting outright kill'd by the kernel. Of those choices, sure seems
like failing the allocation is the better way to go.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-10-19 22:08:15 Re: boolin comment not moved when code was refactored
Previous Message Stephen Frost 2023-10-19 22:01:23 Re: Parent/child context relation in pg_get_backend_memory_contexts()