Re: Add the ability to limit the amount of memory that can be allocated to backends.

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Jim Nasby <jnasby(at)upgrade(dot)com>, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
Cc: Jeremy Schneider <schneider(at)ardentperf(dot)com>, "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com>
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date: 2024-12-31 23:41:05
Message-ID: c5a8939a-5459-4072-a43a-ada2adaf30ba@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/31/24 21:46, Jim Nasby wrote:
> On Dec 30, 2024, at 7:05 PM, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com> wrote:
>>
>> On Sat, Dec 28, 2024 at 11:24 PM Jim Nasby <jnasby(at)upgrade(dot)com> wrote:
>>>
>>> IMHO none of this will be very sane until we actually have cluster-level limits. One sudden burst in active connections and you still OOM the instance.
>>
>> Fwiw, PG does support "max_connections" GUC, so a backend/connection -
>> level limit, times "max_connections", yields a cluster-level limit.
>
> max_connections is useless here, for two reasons:
>
> 1. Changing it requires a restart. That’s at *best* a real PITA in production. [1]
> 2. It still doesn’t solve the actual problem. Unless your workload *and* your data are extremely homogeneous you can’t simply limit the number of connections and call it a day. A slight change in incoming queries, OR in the data that the queries are looking at and you go from running fine to meltdown. You don’t even need a plan flip for this to happen, just the same plan run at the same rate but now accessing more data than before.
>

I really don't follow your argument ...

Yes, changing max_connections requires a restart - so what? AFAIK the
point James was making is that if you multiply max_connections by the
per-backend limit, you get a cluster-wide limit. And presumably the
per-backend limit would be a GUC not requiring a restart.

Yes, high values of max_connections are problematic. I don't see how a
global limit would fundamentally change that. In fact, it could
introduce yet more weird failures because some unrelated backend did
something weird.

FWIW I'm not opposed to having some global memory limit, but as I
explained earlier, I don't see a way to do that sensibly without having
a per-backend limit first. Because if you have a global limit, a single
backend consuming memory could cause all kinds of weird failures in
random other backends.

> Most of what I’ve seen on this thread is discussing ways to *optimize* how much memory the set of running backends can consume. Adjusting how you slice the memory pie across backends, or even within a single backend, is optimization. While that’s a great goal that I do support, it will never fully fix the problem. At some point you need to either throw your hands in the air and start tossing memory errors, because you don’t have control over how much work is being thrown at the engine. The only way that the engine can exert control over that would be to hold new transactions from starting when the system is under duress (ie, workload management). While workload managers can be quite sophisticated (aka, complex), the nice thing about limiting this scope to work_mem, and only as a means to prevent complete overload, is that the problem becomes a lot simpler since you’re only looking at one metric and not trying to support any kind of priority system. The only fanciness I think an MVP would need is a GUC to control how long a transaction can sit waiting before it throws an error. Frankly, that sounds a lot less complex and much easier for DBAs to adjust than trying to teach the planner how to apportion out per-node work_mem limits.
>
> As I said, I’m not opposed to optimizations, I just think they’re very much cart-before-the-horse.
>

What optimization? I didn't notice anything like that. I don't see how
"adjusting how you slice the memory pie across backends" counts as an
optimization. I mean, that's exactly what a memory limit is meant to do.

Similarly, there was a proposal to do planning with work_mem, and then
go back and adjust the per-node limits to impose a global limit. That
does not seem like an optimization either ... (more an opposite of it).

> 1: While it’d be a lot of work to make max_connections dynamic one thing we could do fairly easily would be to introduce another GUC (max_backends?) that actually controls the total number of allowed backends for everything. The sum of max_backends + autovac workers + background workers + whatever else I’m forgetting would have to be less than that. The idea here is that you’d normally run with max_connections set significantly lower than max_backends. That means that if you need to adjust any of these GUCs (other than max_backends) you don’t need to restart - the new limits would just apply to new connection requests.

I don't quite understad how max_backends helps with anything except
allowing to change the limit of connections without a restart, or why
would it be needed for introducing a memory limit. To me those seem very
much like two separate features.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2024-12-31 23:43:58 Re: Strange issue with NFS mounted PGDATA on ugreen NAS
Previous Message Tom Lane 2024-12-31 23:37:56 Re: Strange issue with NFS mounted PGDATA on ugreen NAS