Re: Add the ability to limit the amount of memory that can be allocated to backends.

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com>
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date: 2024-12-28 14:57:44
Message-ID: 4806d917-c019-49c7-9182-1203129cd295@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/28/24 13:36, Anton A. Melnikov wrote:
> Hi!
>
> On 28.12.2024 04:48, Tomas Vondra wrote:
>> On 12/27/24 20:14, James Hunter wrote:
>>> Reviving this thread, because I am thinking about something related --
>>> please ignore the "On Fri, Dec 27, 2024" date, this seems to be an
>>> artifact of me re-sending the message, from the list archive. The
>>> original message was from January 28, 2024.
>>>
>>> On Fri, Dec 27, 2024 at 11:02 AM Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>
>>>> Firstly, I agree with the goal of having a way to account for memory
>>>> used by the backends, and also ability to enforce some sort of limit.
>>>> It's difficult to track the memory at the OS level (interpreting RSS
>>>> values is not trivial), and work_mem is not sufficient to enforce a
>>>> backend-level limit, not even talking about a global limit.
>>>>
>>>> But as I said earlier, it seems quite strange to start by introducing
>>>> some sort of global limit, combining memory for all backends. I do
>>>> understand that the intent is to have such global limit in order to
>>>> prevent issues with the OOM killer and/or not to interfere with other
>>>> stuff running on the same machine. And while I'm not saying we should
>>>> not have such limit, every time I wished to have a memory limit it
>>>> was a
>>>> backend-level one. Ideally a workmem-like limit that would "adjust" the
>>>> work_mem values used by the optimizer (but that's not what this patch
>>>> aims to do), or at least a backstop in case something goes wrong
>>>> (say, a
>>>> memory leak, OLTP application issuing complex queries, etc.).
>>>
>>> I think what Tomas suggests is the right strategy.
>
> I'm also interested in this topic. And agreed that it's best to move
> from the limit
> for a separate backend to the global one. In more details let me suggest
> the following steps or parts:
> 1) realize memory limitation for a separate backend independent from the
> work_mem GUC;
> 2) add workmem-like limit that would "adjust" the work_mem values used by
> the optimize as Thomas suggested;
> 3) add global limit for all backends.
>
> As for p.1 there is a patch that was originally suggested by my colleague
> Maxim Orlov <orlovmg(at)gmail(dot)com> and which i modified for the current
> master.
> This patch introduces the only max_backend_memory GUC that specifies
> the maximum amount of memory that can be allocated to a backend.
> Zero value means no limit.
> If the allocated memory size is exceeded, a standard "out of memory"
> error will be issued.
> Also the patch introduces the
> pg_get_backend_memory_contexts_total_bytes() function,
> which allows to know how many bytes have already been allocated
> to the process in contexts. And in the build with asserts it adds
> the pg_get_backend_memory_allocation_stats() function that allows
> to get additional information about memory allocations for debug purposes.
>

Not sure a simple memory limit like in the patch (which just adds memory
accounting + OOM after hitting the limit) can work as anything but a the
last safety measure. It seems to me the limit would have to be set to a
value that's much higher than any backend would realistically need.

The first challenge I envision is that without any feedback (either to
the planner or executor), it may break queries quite easily. It just
takes the planner to add one more sort / hash join / materialize (which
it can do arbitrarily, as it has no concept of the memory limit), and
now the query can't run.

And secondly, there are allocations that we don't restrict by work_mem,
but this memory limit would include them, ofc. The main example that I
can think of is hash join, where we (currently) don't account for the
BufFile arrays, and that can already lead to annoying OOM issues, see
e.g. [1] [2] and [3] (I'm sure there are more threads about the issue).

It's wrong we don't account for the BufFile arrays, so it's not included
in work_mem (or considered is some other way). And maybe we should
finally improve that (not the fault of this patch, ofc). But it's hard,
because as the amount of data grows, we have to add more batches - and
at some point that starts adding more memory than we save. Ultimately,
we end up breaking work_mem one way or the other - either we add more
batches, or allow the hash table to exceed work_mem.

That's a preexisting issue, of course. But wouldn't this simple limit
make the situation worse? The query would likely complete OK (otherwise
we'd get many more reports about OOM), but with the new limit it would
probably fail with OOM. Maybe that's correct, and the response to that
is "Don't set the limit with such queries," although it's hard to say in
advance and it can happen randomly? Not sure.

What bothers me a bit is that users would likely try to reduce work_mem,
but that's the wrong thing to do in this case - it just increases the
number of batches, and thus makes the situation worse.

[1]
https://www.postgresql.org/message-id/20190504003414.bulcbnge3rhwhcsh%40development

[2] https://www.postgresql.org/message-id/20230228190643.1e368315%40karst

[3]
https://www.postgresql.org/message-id/bc138e9f-c89e-9147-5395-61d51a757b3b%40gusw.net

>>> This strategy solves the ongoing problem of how to set work_mem, if
>>> some queries have lots of operators and others don't -- now we just
>>> set backend_work_mem, as a limit on the entire query's total work_mem.
>>> And a bit of integration with the optimizer will allow us to
>>> distribute the total backend_work_mem to individual execution nodes,
>>> with the goal of minimizing spilling, without exceeding the
>>> backend_work_mem limit.
>
> As for p.2 maybe one can set a maximum number of parallel sort or
> hash table operations before writing to disk instead of absolute
> value in the work_mem GUC? E.g. introduce а max_query_operations GUC
> or a variable in such a way that old work_mem will be equal
> to max_backend_memory divided by max_query_operations.
>
> What do you think about such approach?
>

It's not clear to me how you want to calculate the number of parallel
operations that might use work_mem. Can you elaborate?

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-12-28 15:32:17 Re: Re: proposal: schema variables
Previous Message Anton A. Melnikov 2024-12-28 12:36:37 Re: Add the ability to limit the amount of memory that can be allocated to backends.