Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-27 00:24:36
Message-ID: CAH2-Wz=osB4oi_nH8MnosYhVVSNOm5q3=exGe-b3q6gWOgf98w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I agree larger work_mem for hashagg (and thus less spilling) may mean
> lower work_mem for so some other nodes that are less sensitive to this.
> But I think this needs to be formulated as a cost-based decision,
> although I don't know how to do that for the reasons I explained before
> (bottom-up plan construction vs. distributing the memory budget).

Why do you think that it needs to be formulated as a cost-based
decision? That's probably true of a scheme that allocates memory very
intelligently, but what about an approach that's slightly better than
work_mem?

What problems do you foresee (if any) with adding a hash_mem GUC that
gets used for both planning and execution for hash aggregate and hash
join nodes, in about the same way as work_mem is now?

> FWIW some databases already do something like this - SQL Server has
> something called "memory grant" which I think mostly does what you
> described here.

Same is true of Oracle. But Oracle also has simple work_mem-like
settings for sorting and hashing. People don't really use them
anymore, but apparently it was once common for the DBA to explicitly
give over more memory to hashing -- much like the hash_mem setting I
asked about. IIRC the same is true of DB2.

> The difference between sort and hashagg spills is that for sorts there
> is no behavior change. Plans that did (not) spill in v12 will behave the
> same way on v13, modulo some random perturbation. For hashagg that's not
> the case - some queries that did not spill before will spill now.
>
> So even if the hashagg spills are roughly equal to sort spills, both are
> significantly more expensive than not spilling.

Just to make sure we're on the same page: both are significantly more
expensive than a hash aggregate not spilling *specifically*. OTOH, a
group aggregate may not be much slower when it spills compared to an
in-memory sort group aggregate. It may even be noticeably faster, due
to caching effects, as you mentioned at one point upthread.

This is the property that makes hash aggregate special, and justifies
giving it more memory than other nodes on a system-wide basis (the
same thing applies to hash join). This could even work as a multiplier
of work_mem.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Amit Kapila 2020-06-27 10:00:25 Re: Default setting for enable_hashagg_disk
Previous Message Bruce Momjian 2020-06-27 00:05:16 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-06-27 00:46:10 Re: Review for GetWALAvailability()
Previous Message Bruce Momjian 2020-06-27 00:05:16 Re: Default setting for enable_hashagg_disk