Re: Default setting for enable_hashagg_disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 19:14:33
Message-ID: 20200624191433.5gnqgrxfmucexldm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Hi,

On 2020-06-24 14:11:57 +1200, David Rowley wrote:
> 1. Statistics underestimation can cause hashagg to be selected. The
> executor will spill to disk in PG13. Users may find performance
> suffers as previously the query may have just overshot work_mem
> without causing any OOM issues. Their I/O performance might be
> terrible.

> 2. We might now choose to hash aggregate where pre PG13, we didn't
> choose that because the hash table was estimated to be bigger than
> work_mem. Hash agg might not be the best plan for the job.

> For #1. We know users are forced to run smaller work_mems than they
> might like as they need to allow for that random moment where all
> backends happen to be doing that 5-way hash join all at the same time.
> It seems reasonable that someone might want the old behaviour. They
> may well be sitting on a timebomb that's about to OOM, but it would be
> sad if someone's upgrade to PG13 was blocked on this, especially if
> it's just due to some query that runs once per month but needs to
> perform quickly.

I'm quite concerned about this one. I think this isn't just going to hit
when the planner mis-estimates ndistinct, but also when transition
values use a bit more space. We'll now start spilling in cases the
< v13 planner did everything right.

That's great for cases where we'd otherwise OOM, but for a lot of other
cases where there actually is more than sufficient RAM to overrun
work_mem by a single-digit factor, it can cause a pretty massive
increase of IO over < v13.

FWIW, my gut feeling is that we'll end up have to separate the
"execution time" spilling from using plain work mem, because it'll
trigger spilling too often. E.g. if the plan isn't expected to spill,
only spill at 10 x work_mem or something like that. Or we'll need
better management of temp file data when there's plenty memory
available.

> For #2. This seems like a very legitimate requirement to me. If a
> user is unhappy that PG13 now hashaggs where before it sorted and
> group aggregated, but they're unhappy, not because there's some issue
> with hashagg spilling, but because that causes the node above the agg
> to becomes a Hash Join rather than a Merge Join and that's bad for
> some existing reason. Our planner doing the wrong thing based on
> either; lack of, inaccurate or out-of-date statistics is not Jeff's
> fault. Having the ability to switch off a certain planner feature is
> just following along with what we do today for many other node types.

This one concerns me a bit less, fwiw. There's a lot more "pressure" in
the planner to choose hash agg or sorted agg, compared to e.g. a bunch
of aggregate states taking up a bit more space (can't estimate that at
all for ma.

> As for GUCs to try to help the group of users who, *I'm certain*, will
> have problems with PG13's plan choice. I think the overloaded
> enable_hashagg option is a really nice compromise. We don't really
> have any other executor node type that has multiple GUCs controlling
> its behaviour, so I believe it would be nice to keep it that way.
>
> How about:
>
> enable_hashagg = "on" -- enables hashagg allowing it to freely spill
> to disk as it pleases.
> enable_hashagg = "trynospill" -- Planner will only choose hash_agg if
> it thinks it won't spill (pre PG13 planner behaviour)
> enable_hashagg = "neverspill" -- executor will *never* spill to disk
> and can still OOM (NOT RECOMMENDED, but does give pre PG13 planner and
> executor behaviour)
> enable_hashagg = "off" -- planner does not consider hash agg, ever.
> Same as what PG12 did for this setting.
>
> Now, it's a bit weird to have "neverspill" as this is controlling
> what's done in the executor from a planner GUC. Likely we can just
> work around that by having a new "allowhashspill" bool field in the
> "Agg" struct that's set by the planner, say during createplan that
> controls if nodeAgg.c is allowed to spill or not. That'll also allow
> PREPAREd plans to continue to do what they had planned to do already.
>
> The thing I like about doing it this way is that:
>
> a) it does not add any new GUCs
> b) it semi hides the weird values that we really wish nobody would
> ever have to set in a GUC that people have become used it just
> allowing the values "on" and "off".
>
> The thing I don't quite like about this idea is:
> a) I wish the planner was perfect and we didn't need to do this.
> b) It's a bit weird to overload a GUC that has a very booleanish name
> to not be bool.
>
> However, I also think it's pretty lightweight to support this. I
> imagine a dozen lines of docs and likely about half a dozen lines per
> GUC option in the planner.

That'd work for me, but I honestly don't particularly care about the
specific naming, as long as we provide users an escape hatch from the
increased amount of IO.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Andres Freund 2020-06-24 19:19:00 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-06-24 18:40:50 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-06-24 19:19:00 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-06-24 18:40:50 Re: Default setting for enable_hashagg_disk