From: | James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators |
Date: | 2025-02-26 21:27:17 |
Message-ID: | CAJVSvF6ckAvcFkCT-cWAqYpVn2MM5zOEiTT-ubBB4mjFfCkAWQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 24, 2025 at 9:55 PM James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com> wrote:
>
> On Mon, Feb 24, 2025 at 6:54 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > On Mon, 2025-02-24 at 12:46 -0800, James Hunter wrote:
> > > Attached please find the patch set I mentioned, above, in [1]. It
> > > consists of 4 patches that serve as the building blocks for and a
> > > prototype of the "query_work_mem" GUC I proposed:
> >
> > I didn't look at the details yet. But from:
> >
> > https://www.postgresql.org/message-id/CAJVSvF7x_DLj7-JrXvMB4_j%2BjzuvjG_7iXNjx5KmLBTXHPNdGA%40mail.gmail.com
> >
> > I expected something much smaller in scope, where we just add a
> > "plan_work_mem" field to the Plan struct, copy the work_mem global GUC
> > to that field when we construct a Plan node, and then reference the
> > plan_work_mem instead of the GUC directly.
Attaching a new refactoring, which splits the code changes into
patches by functionality. This refactoring yields 5 patches, each of
which is relatively localized. I hope that the result will be more
focused and more feasible to review.
* Patch 1: modifies file setrefs.c, to track all (regular) SubPlan
objects that occur inside of Plan node (qual) expressions, on a new
Plan.subPlan list (parallel to the existing Plan.initPlan list, which
is for SubPlans that have been turned into init plans).
[Patch 1 has no visible side effects, since it just populates a list
on the Plan object.]
* Patch 2: copies the work_mem [* hash_mem_multiplier] GUC(s) to a new
Plan field, Plan.workmem_limit; and modifies existing exec nodes to
read the limit from this field instead of the GUCs. Adds a new file,
"execWorkmem.c", that does the GUC copying, and modifies existing exec
nodes to read the new field(s).
[Patch 2 has no visible side effects, since it just refactors code, to
store the GUCs on a field and then read those fields instead of the
GUCs.]
* Patch 3: stores the optimizer's estimate of how much working memory
a given Path / Plan node will use, on the Path / Plan, in a new field,
"workmem". (I used "workmem" for the estimate, vs. "workmem_limit," in
Patch 2, for the limit. This is to try to be parallel with the
existing "rows" and "cost" estimates.) Involves a significant amount
of code in costsize.c and createplan.c, because sometimes this
estimate is not readily available.
(What I mean is: while Patch 2 just reads the workmem_limit from a
GUC, Patch 3 has to estimate the actual workmem by basically
multiplying (width * rows). But not all Paths / Plans cost the
possibility of spilling, so sometimes I have to copy this formula from
the corresponding exec node, etc. The logical changes in Patch 3 are
simple, but the physical LoC is larger.)
[Patch 3 has no visible side effect, since it just stores an estimate
on the Plan object.]
* Patch 4: modifies file explain.c to implement a "work_mem on" option
to the EXPLAIN command. Also adds a unit test that shows that this
"work_mem on" option works as expected.
[Patch 4 is pure visible side effect -- all it does it add a new
option to display workmem stats to the customer. But it doesn't change
any existing behavior: it just adds a new EXPLAIN option.]
* Patch 5: adds a sample extension / hook that shows how Patches 2 and
3 can be used -- without much effort! -- to implement a per-query
working-memory limit, that gives more working memory to exec nodes
that (are estimated to) need it, while taking working memory away, if
necessary, from exec nodes that (we estimate) don't need it.
The refactored patch set should be more feasible to review, since each
patch is now localized to a single piece of functionality.
Note that Patch 5 isn't essential to merge into core PostgreSQL, since
it's mostly a proof-of-concept for how a "work_mem.query_work_mem" GUC
could be implemented. But Patches 2 and 3 are needed, since they
expose the limit and estimate, on the Plan, on which Patch 5 (or any
similar working-memory extension) relies.
Thanks again,
James Hunter
Attachment | Content-Type | Size |
---|---|---|
0001-Store-non-init-plan-SubPlan-objects-in-Plan-list.patch | application/octet-stream | 36.2 KB |
0002-Store-working-memory-limit-on-Plan-field-rather-than.patch | application/octet-stream | 43.0 KB |
0003-Add-workmem-estimate-to-Path-and-Plan-nodes.patch | application/octet-stream | 52.8 KB |
0004-Add-EXPLAIN-work_mem-on-command-option.patch | application/octet-stream | 48.4 KB |
0005-Add-workmem_hook-to-allow-extensions-to-override-per.patch | application/octet-stream | 54.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-02-26 21:32:26 | Re: Statistics Import and Export |
Previous Message | Robert Haas | 2025-02-26 21:07:48 | Re: Lowering temp_buffers minimum |