Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators

From: James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
Date: 2025-01-25 01:04:40
Message-ID: CAJVSvF7B+z0i=jMvvcWnq=Vh8AHZwe02BSBmOfmGjiu1Nf4_wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2025 at 1:26 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Fri, 2025-01-10 at 10:00 -0800, James Hunter wrote:
> > How should “query_work_mem” work? Let’s start with an example:
> > suppose
> > we have an OLAP query that has 2 Hash Joins, and no other operators
> > that use work_mem.
>
> So we plan first, and then assign available memory afterward? If we do
> it that way, then the costing will be inaccurate, because the original
> costs are based on the original work_mem.
>
> It may be better than killing the query, but not ideal.

As you point out, the outcome is better, but not ideal. My intuition
is that an "ideal" solution would increase query compilation times
beyond what customers would accept...

But at least the outcome, if not ideal is better than killing the
query! So it is a net improvement.

> > I propose that we add a “query_work_mem” GUC, which works by
> > distributing (using some algorithm to be described in a follow-up
> > email) the entire “query_work_mem” to the query’s operators. And then
> > each operator will spill when it exceeds its own work_mem limit. So
> > we’ll preserve the existing “spill” logic as much as possible.
>
> The description above sounds too "top-down" to me. That may work, but
> has the disadvantage that costing has already happened. We should also
> consider:
>
> * Reusing the path generation infrastructure so that both "high memory"
> and "low memory" paths can be considered, and if a path requires too
> much memory in aggregate, then it would be rejected in favor of a path
> that uses less memory. This feels like it fits within the planner
> architecture the best, but it also might lead to a path explosion, so
> we may need additional controls.
>
> * Some kind of negotiation where the top level of the planner finds
> that the plan uses too much memory, and replans some or all of it. (I
> think is similar to what you described as the "feedback loop" later in
> your email.) I agree that this is complex and may not have enough
> benefit to justify.

Generating "high memory" vs. "low memory" paths would be tricky,
because the definition of "high" vs. "low" depends on the entire path
tree, not just on a single path node. So I think it would quickly lead
to a state-space explosion, as you mention.

And I think negotiation has the same problem: it's based on the entire
tree, not just an individual path node. I think the general problem is
not so much "top-down" vs. "bottom-up", as "individual path node" vs.
"entire path tree." Today, PostgreSQL costs each path node
individually, by referring to the static "work_mem" GUC. In any
attempt to improve the optimizer's choice, I think we'd have to cost
the entire path tree. And there are many more trees than there are
tree nodes.

For example, the decision whether to prefer a Nested Loop vs. a Hash
Join that takes 2 MB of working memory, depends on what the query's
other joins are doing.

At any rate, I think we can solve the problem of "killing the query"
now; and then worry, in the future, about the ideal solution of how to
pick the optimal execution plan.

> Regards,
> Jeff Davis

Thanks for your comments!
James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-01-25 01:27:02 Re: Convert sepgsql tests to TAP
Previous Message Andrew Dunstan 2025-01-25 00:50:27 Re: Convert sepgsql tests to TAP