From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Default setting for enable_hashagg_disk |
Date: | 2020-06-10 02:15:44 |
Message-ID: | 20200610021544.GA14879@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
On Tue, Jun 09, 2020 at 06:20:13PM -0700, Melanie Plageman wrote:
> On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > 2. enable_groupingsets_hash_disk (default false):
> >
> > This is about how we choose which grouping sets to hash and which to
> > sort when generating mixed mode paths.
> >
> > Even before this patch, there are quite a few paths that could be
> > generated. It tries to estimate the size of each grouping set's hash
> > table, and then see how many it can fit in work_mem (knapsack), while
> > also taking advantage of any path keys, etc.
> >
> > With Disk-based Hash Aggregation, in principle we can generate paths
> > representing any combination of hashing and sorting for the grouping
> > sets. But that would be overkill (and grow to a huge number of paths if
> > we have more than a handful of grouping sets). So I think the existing
> > planner logic for grouping sets is fine for now. We might come up with
> > a better approach later.
> >
> > But that created a testing problem, because if the planner estimates
> > correctly, no hashed grouping sets will spill, and the spilling code
> > won't be exercised. This GUC makes the planner disregard which grouping
> > sets' hash tables will fit, making it much easier to exercise the
> > spilling code. Is there a better way I should be testing this code
> > path?
>
> So, I was catching up on email and noticed the last email in this
> thread.
>
> I think I am not fully understanding what enable_groupingsets_hash_disk
> does. Is it only for testing?
If so, it should be in category: "Developer Options".
> Using the tests you added to src/test/regress/sql/groupingsets.sql, I
> did get a plan that looks like hashagg is spilling to disk (goes through
> hashagg_spill_tuple() code path and has number of batches reported in
> Explain) in a MixedAgg plan for a grouping sets query even with
> enable_groupingsets_hash_disk set to false.
> I'm not sure if this is more what you were looking for--or maybe I am
> misunderstanding the guc.
The behavior of the GUC is inconsistent with the other GUCs, which is
confusing. See also Robert's comments in this thread.
https://www.postgresql.org/message-id/20200407223900.GT2228%40telsasoft.com
The old (pre-13) behavior was:
- work_mem is the amount of RAM to which each query node tries to constrain
itself, and the planner will reject a plan if it's expected to exceed that.
...But a chosen plan might exceed work_mem anyway.
The new behavior in v13 seems to be:
- HashAgg now respects work_mem, but instead enable*hash_disk are
opportunisitic. A node which is *expected* to spill to disk will be
rejected.
...But at execution time, a node which exceeds work_mem will be spilled.
If someone sees a plan which spills to disk and wants to improve performance by
avoid spilling, they might SET enable_hashagg_disk=off, which might do what
they want (if the plan is rejected at plan time), or it might not, which I
think will be a surprise every time.
If someone agrees, I suggest to add this as an Opened Item.
Maybe some combination of these would be an improvement:
- change documentation to emphasize behavior;
- change EXPLAIN ouput to make it obvious this isn't misbehaving;
- rename the GUC to not start with enable_* (work_mem_exceed?)
- rename the GUC *values* to something other than on/off. On/Planner?
- change the GUC to behave like it sounds like it should, which means "off"
would allow the pre-13 behavior of exceeding work_mem.
- Maybe make it ternary, like:
exceed_work_mem: {spill_disk, planner_reject, allow}
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2020-06-10 06:45:38 | Re: some charts or graphs of possible permissions would be nice |
Previous Message | Melanie Plageman | 2020-06-10 01:20:13 | Re: Default setting for enable_hashagg_disk |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-06-10 03:00:15 | Re: FailedAssertion at ReorderBufferCheckMemoryLimit() |
Previous Message | torikoshia | 2020-06-10 01:50:58 | Re: Is it useful to record whether plans are generic or custom? |