From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Default setting for enable_hashagg_disk |
Date: | 2020-07-10 14:17:14 |
Message-ID: | 20200710141714.GI12375@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
Greetings,
* Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I didn't, and don't, think it particularly relevant to the discussion,
> > but if you don't like the comparison to Sort then we could compare it to
> > a HashJoin instead- the point is that, yes, if you are willing to give
> > more memory to a given operation, it's going to go faster, and the way
> > users tell us that they'd like the query to use more memory is already
> > well-defined and understood to be through work_mem. We do not do them a
> > service by ignoring that.
>
> The hash_mem design (as it stands) would affect both hash join and
> hash aggregate. I believe that it makes most sense to have hash-based
> nodes under the control of a single GUC. I believe that this
> granularity will cause the least problems. It certainly represents a
> trade-off.
So, now this has moved from being a hack to deal with a possible
regression for a small number of users due to new behavior in one node,
to a change that has impacts on other nodes that hadn't been changed,
all happening during beta.
No, I don't agree with this. Now is not the time for designing new
features for v13.
> work_mem is less of a problem with hash join, primarily because join
> estimates are usually a lot better than grouping estimates. But it is
> nevertheless something that it makes sense to put in the same
> conceptual bucket as hash aggregate, pending a future root and branch
> redesign of work_mem.
I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
the wrong direction because what's actually needed is a way to properly
consider and track overall memory usage- a redesign of work_mem (or some
new parameter, but it wouldn't be 'hash_mem') as you say, but all of
this discussion should be targeting v14.
> Like I said, the escape hatch GUC is not my preferred solution. But at
> least it acknowledges the problem. I don't think that anyone (or
> anyone else) believes that work_mem doesn't have serious limitations.
work_mem obviously has serious limitations, but that's not novel or new
or unexpected by anyone.
> > We have a parameter which already drives this and which users are
> > welcome to (and quite often do) tune. I disagree that anything further
> > is either essential or particularly desirable.
>
> This is a user hostile attitude.
I don't find that argument convincing, at all.
> > I'm really rather astounded at the direction this has been going in.
>
> Why?
Due to the fact that we're in beta and now is not the time to be
redesigning this feature. What Jeff implemented was done in a way that
follows the existing structure for how all of the other nodes work and
how HashAgg was *intended* to work (as in- if we thought the HashAgg
would go over work_mem, we wouldn't pick it and would do a GroupAgg
instead). If there's bugs in his implementation (which I doubt, but it
can happen, of course) then that'd be useful to discuss and look at
fixing, but this discussion isn't appropriate for beta.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-07-10 14:34:15 | Re: Default setting for enable_hashagg_disk |
Previous Message | Stephen Frost | 2020-07-10 13:43:51 | Re: Default setting for enable_hashagg_disk |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-07-10 14:34:15 | Re: Default setting for enable_hashagg_disk |
Previous Message | Tom Lane | 2020-07-10 14:10:46 | Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions |