From: | James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com> |
Subject: | Re: Add the ability to limit the amount of memory that can be allocated to backends. |
Date: | 2024-12-31 01:17:58 |
Message-ID: | CAJVSvF5dns2ZR=+BURO41D20Sfa8EM1f-WK+YX5Ytm3zpKfYNA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 30, 2024 at 3:12 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 28 Dec 2024 at 08:14, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com> wrote:
> > 2. We use this backend_work_mem to "adjust" work_mem values used by
> > the executor. (I don't care about the optimizer right now -- optimizer
> > just does its best to predict what will happen at runtime.)
>
> While I do want to see improvements in this area, I think "don't care
> about the optimizer" is going to cause performance issues. The
> problem is that the optimizer takes into account what work_mem is set
> to when calculating the costs of work_mem-consuming node types. See
> costsize.c for usages of "work_mem". If you go and reduce the amount
> of memory a given node can consume after the costs have been applied
> then we may end up in a situation where some other plan would have
> suited much better.
Yes, but this sort of thing is unavoidable in SQL compilers. We can
always generate and cost more paths, but that increases compile time,
which eats away at runtime.
I agree that it's good to mitigate these sorts of cases, but I see
that as a refinement on the original idea.
> There's also the problem with what to do when you chop work_mem down
> so far that the remaining size is just a pitiful chunk. For now,
> work_mem can't go below 64 kilobytes. You might think that's a very
> unlikely situation that it'd be chopped down so far, but with
> partition-wise join and partition-wise aggregate, we could end up
> using a work_mem per partition and if you have thousands of partitions
> then you might end up reducing work_mem by quite a large amount.
I look at this from the opposite direction: if I am doing a
partition-wise join or aggregate, and that join or aggregate requires
more memory than I feel comfortable letting it use, then the current
PG behavior doesn't solve this problem, so much as throw up its hands
and hope for the best. Because if I don't have 64 KB * (# joins) * (#
join partitions), today, then either my query is going to run OOM --
or I am lying, and I really have more memory than I claim. :)
> I think the best solution to this is the memory grant stuff I talked
> about in [1].
I am absolutely in favor of memory grants. Long-term, I think memory
grants, like what you described in [1], are absolutely necessary.
However, per-backend limits are:
(a) easier;
(b) a good intermediate step; and
(c) still necessary in a world in which we have memory grants, to
prevent deadlocks.
In [1], you suggested using PG's deadlock detector, but this has the
side effect of killing queries, which I would like to avoid.
Even with memory grants, if you give each backend a minimal amount of
reserved working-memory, you can avoid deadlock, by making the
"offending" spill and run slowly, rather than getting canceled.
> That does require figuring out which nodes will consume
> the work_mem concurrently so that infrastructure you talked about to
> do that would be a good step forward towards that, but that's probably
> not the most difficult part of that idea.
Yeah - for example, specifying work_mem per node, on the PlanState,
gives the executor some options, if it can't checkout as much memory
as it would like.
> I definitely encourage work in this area, but I think what you're
> proposing might be swapping one problem for another problem.
>
> David
>
> [1] https://www.postgresql.org/message-id/CAApHDvrzacGEA1ZRea2aio_LAi2fQcgoK74bZGfBddg4ymW-Ow@mail.gmail.com
Thanks for the feedback!
James
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-12-31 01:23:29 | Re: Backport of CVE-2024-10978 fix to older pgsql versions (11, 9.6, and 9.4) |
Previous Message | Michael Paquier | 2024-12-31 01:06:45 | Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) |