Re: Default setting for enable_hashagg_disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 19:31:22
Message-ID: 20200624193122.cqolrvl44fh4w4s4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Hi,

On 2020-06-24 14:40:50 -0400, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> > On Wed, Jun 24, 2020 at 01:29:56PM -0400, Tom Lane wrote:
> >> If we feel we need something to let people have the v12 behavior
> >> back, let's have
> >> (1) enable_hashagg on/off --- controls planner, same as it ever was
> >> (2) enable_hashagg_spill on/off --- controls executor by disabling spill
>
> > What if a user specifies
> > enable_hashagg = on
> > enable_hashagg_spill = off
>
> It would probably be reasonable for the planner to behave as it did
> pre-v13, that is not choose hashagg if it estimates that work_mem
> would be exceeded. (So, okay, that means enable_hashagg_spill
> affects both planner and executor ... but ISTM it's just one
> behavior not two.)

There's two different reasons for spilling in the executor right now:

1) The planner estimated that we'd need to spill, and that turns out to
be true. There seems no reason to not spill in that case (as long as
it's enabled/chosen in the planner).

2) The planner didn't think we'd need to spill, but we end up using more
than work_mem memory.

nodeAgg.c already treats those separately:

void
hash_agg_set_limits(double hashentrysize, uint64 input_groups, int used_bits,
Size *mem_limit, uint64 *ngroups_limit,
int *num_partitions)
{
int npartitions;
Size partition_mem;

/* if not expected to spill, use all of work_mem */
if (input_groups * hashentrysize < work_mem * 1024L)
{
if (num_partitions != NULL)
*num_partitions = 0;
*mem_limit = work_mem * 1024L;
*ngroups_limit = *mem_limit / hashentrysize;
return;
}

We can't sensibly disable spilling when chosen at plan time, because
that'd lead to *more* OOMS than in v12.

ISTM that we should have one option that controls whether 1) is done,
and one that controls whether 2) is done. Even if the option for 2 is
off, we still should spill when the option for 1) chooses a spilling
plan. I don't think it makes sense for one of those options to
influence the other implicitly.

So maybe enable_hashagg_spilling_plan for 1) and
hashagg_spill_on_exhaust for 2).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Andres Freund 2020-06-24 19:36:24 Re: Default setting for enable_hashagg_disk
Previous Message Robert Haas 2020-06-24 19:28:47 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-06-24 19:34:34 Re: Why forbid "INSERT INTO t () VALUES ();"
Previous Message Robert Haas 2020-06-24 19:28:47 Re: Default setting for enable_hashagg_disk