Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
Date: 2025-01-22 22:42:54
Message-ID: 6de150743574b0484f8f9b64940d31c15a01b402.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-01-22 at 21:48 +0100, Tomas Vondra wrote:
> But these estimates are often pretty fundamentally unreliable - maybe
> not for simple examples, but once you put an aggregate on top of a
> join,
> the errors can be pretty wild.

It would be conditional on whether there's some kind of memory
constraint or not. Setting aside the difficulty of implementing a new
memory constraint, if we assume there is one, then it would be good to
know how much memory an operator estimates that it needs.

(Also, if extra memory is available, spill files will be able to use
the OS filesystem cache, which mitigates the spilling cost.)

Another thing that would be good to know is about concurrent memory
usage. That is, if it's a blocking executor node, then it can release
all the memory from child nodes when it completes. Therefore the
concurrent memory usage might be less than just the sum of memory used
by all operators in the plan.

> I don't know if generating (and keeping) low/high-memory paths is
> quite
> feasible. Isn't that really a continuum for many paths? A hash join
> may
> need very little memory (with batching) or a lot of memory (if
> keeping
> everything in memory), so how would this work? Would we generate
> paths
> for a range of work_mem values (with different costs)?

A range might cause too much of an explosion. Let's do something simple
like define "low" to mean 1/16th, or have a separate low_work_mem GUC
(that could be an absolute number or a fraction).

There are a few ways we could pass the information down. We could just
have every operator generate twice as many paths (at least those
operators that want to use as much memory as they can get). Or we could
pass down the query_work_mem by subtracting the current operator's
memory needs and dividing what's left among its input paths.

We may have to track extra information to make sure that high-memory
paths don't dominate low-memory paths that are still useful (similar to
path keys).

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-01-22 22:48:46 Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Previous Message Tom Lane 2025-01-22 22:12:41 Re: Wrong security context for deferred triggers?