Re: Add the ability to limit the amount of memory that can be allocated to backends.

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com>
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date: 2023-10-20 02:36:07
Message-ID: c6520d47-e584-4287-833c-82779cc166e0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20/10/2023 05:06, Stephen Frost wrote:
> Greetings,
>
> * Andrei Lepikhov (a(dot)lepikhov(at)postgrespro(dot)ru) wrote:
>> On 19/10/2023 02:00, Stephen Frost wrote:
>>> * Andrei Lepikhov (a(dot)lepikhov(at)postgrespro(dot)ru) wrote:
>>>> On 29/9/2023 09:52, Andrei Lepikhov wrote:
>>>>> On 22/5/2023 22:59, reid(dot)thompson(at)crunchydata(dot)com wrote:
>>>>>> Attach patches updated to master.
>>>>>> Pulled from patch 2 back to patch 1 a change that was also pertinent
>>>>>> to patch 1.
>>>>> +1 to the idea, have doubts on the implementation.
>>>>>
>>>>> I have a question. I see the feature triggers ERROR on the exceeding of
>>>>> the memory limit. The superior PG_CATCH() section will handle the error.
>>>>> As I see, many such sections use memory allocations. What if some
>>>>> routine, like the CopyErrorData(), exceeds the limit, too? In this case,
>>>>> we could repeat the error until the top PG_CATCH(). Is this correct
>>>>> behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
>>>>> recursion and allow error handlers to slightly exceed this hard limit?
>>>
>>>> By the patch in attachment I try to show which sort of problems I'm worrying
>>>> about. In some PП_CATCH() sections we do CopyErrorData (allocate some
>>>> memory) before aborting the transaction. So, the allocation error can move
>>>> us out of this section before aborting. We await for soft ERROR message but
>>>> will face more hard consequences.
>>>
>>> While it's an interesting idea to consider making exceptions to the
>>> limit, and perhaps we'll do that (or have some kind of 'reserve' for
>>> such cases), this isn't really any different than today, is it? We
>>> might have a malloc() failure in the main path, end up in PG_CATCH() and
>>> then try to do a CopyErrorData() and have another malloc() failure.
>>>
>>> If we can rearrange the code to make this less likely to happen, by
>>> doing a bit more work to free() resources used in the main path before
>>> trying to do new allocations, then, sure, let's go ahead and do that,
>>> but that's independent from this effort.
>>
>> I agree that rearranging efforts can be made independently. The code in the
>> letter above was shown just as a demo of the case I'm worried about.
>> IMO, the thing that should be implemented here is a recursion level for the
>> memory limit. If processing the error, we fall into recursion with this
>> limit - we should ignore it.
>> I imagine custom extensions that use PG_CATCH() and allocate some data
>> there. At least we can raise the level of error to FATAL.
>
> Ignoring such would defeat much of the point of this effort- which is to
> get to a position where we can say with some confidence that we're not
> going to go over some limit that the user has set and therefore not
> allow ourselves to end up getting OOM killed. These are all the same
> issues that already exist today on systems which don't allow overcommit
> too, there isn't anything new here in regards to these risks, so I'm not
> really keen to complicate this to deal with issues that are already
> there.
>
> Perhaps once we've got the basics in place then we could consider
> reserving some space for handling such cases.. but I don't think it'll
> actually be very clean and what if we have an allocation that goes
> beyond what that reserved space is anyway? Then we're in the same spot
> again where we have the choice of either failing the allocation in a
> less elegant way than we might like to handle that error, or risk
> getting outright kill'd by the kernel. Of those choices, sure seems
> like failing the allocation is the better way to go.

I've got your point.
The only issue I worry about is the uncertainty and clutter that can be
created by this feature. In the worst case, when we have a complex error
stack (including the extension's CATCH sections, exceptions in stored
procedures, etc.), the backend will throw the memory limit error
repeatedly. Of course, one failed backend looks better than a
surprisingly killed postmaster, but the mix of different error reports
and details looks terrible and challenging to debug in the case of
trouble. So, may we throw a FATAL error if we reach this limit while
handling an exception?

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-10-20 02:36:21 Re: Guiding principle for dropping LLVM versions?
Previous Message Jeff Davis 2023-10-20 02:22:07 Re: [PoC/RFC] Multiple passwords, interval expirations