Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
Date: 2025-01-21 21:26:52
Message-ID: 0beedccb92ba5a1db386bd09b44e15b20c1ac1fc.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2025-01-10 at 10:00 -0800, James Hunter wrote:
> How should “query_work_mem” work? Let’s start with an example:
> suppose
> we have an OLAP query that has 2 Hash Joins, and no other operators
> that use work_mem.

So we plan first, and then assign available memory afterward? If we do
it that way, then the costing will be inaccurate, because the original
costs are based on the original work_mem.

It may be better than killing the query, but not ideal.

> -- Second, we could say, instead: the small Hash Join is *highly
> unlikely* to use > 1 MB, so let’s just give both Hash Joins 1023 MB,
> expecting that the small Hash Join won’t use more than 1 MB of its
> 1023 MB allotment anyway, so we won’t run OOM. In effect, we’re
> oversubscribing, betting that the small Hash Join will just stay
> within some smaller, “unenforced” memory limit.
>
> In this example, this bet is probably fine — but it won’t work in
> general. I don’t want to be in the business of gambling with customer
> resources: if the small Hash Join is unlikely to use more than 1 MB,
> then let’s just assign it 1 MB of work_mem.

I like this idea. Operators that either know they don't need much
memory, or estimate that they don't need much memory, can constrain
themselves. That would protect against misestimations and advertise to
the higher levels of the planner how much memory the operator actually
wants. Right now, the planner doesn't know which operators need a lot
of memory and which ones don't need any significant amount at all.

The challenge, of course, is what the higher levels of the planner
would do with that information, which goes to the rest of your
proposal. But tracking the information seems very reasonable to me.

> I propose that we add a “query_work_mem” GUC, which works by
> distributing (using some algorithm to be described in a follow-up
> email) the entire “query_work_mem” to the query’s operators. And then
> each operator will spill when it exceeds its own work_mem limit. So
> we’ll preserve the existing “spill” logic as much as possible.

The description above sounds too "top-down" to me. That may work, but
has the disadvantage that costing has already happened. We should also
consider:

* Reusing the path generation infrastructure so that both "high memory"
and "low memory" paths can be considered, and if a path requires too
much memory in aggregate, then it would be rejected in favor of a path
that uses less memory. This feels like it fits within the planner
architecture the best, but it also might lead to a path explosion, so
we may need additional controls.

* Some kind of negotiation where the top level of the planner finds
that the plan uses too much memory, and replans some or all of it. (I
think is similar to what you described as the "feedback loop" later in
your email.) I agree that this is complex and may not have enough
benefit to justify.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-21 21:30:17 Re: attndims, typndims still not enforced, but make the value within a sane threshold
Previous Message Sami Imseih 2025-01-21 21:22:08 Re: pg_stat_statements: improve loading and saving routines for the dump file