Re: Default setting for enable_hashagg_disk

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 12:24:29
Message-ID: CAApHDvrcfOOjRmO5EZk_a_GKd-RUGG+2FTM=Sr4jZUbHaBJqjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, 24 Jun 2020 at 21:06, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I
> don't remember anyone complaining about spills to disk during merge
> join, so I am unclear why we would need a such control for hash join.

Hash aggregate, you mean? The reason is that upgrading to PG13 can
cause a performance regression for an underestimated ndistinct on the
GROUP BY clause and cause hash aggregate to spill to disk where it
previously did everything in RAM. Sure, that behaviour was never
what we wanted to happen, Jeff has fixed that now, but the fact
remains that this does happen in the real world quite often and people
often get away with it, likey because work_mem is generally set to
some very conservative value. Of course, there's also a bunch of
people that have been bitten by OOM due to this too. The "neverspill"
wouldn't be for those people. Certainly, it's possible that we just
tell these people to increase work_mem for this query, that way they
can set it to something reasonable and still get spilling if it's
really needed to save them from OOM, but the problem there is that
it's not very easy to go and set work_mem for a certain query.

FWIW, I wish that I wasn't suggesting we do this, but I am because it
seems simple enough to implement and it removes a pretty big roadblock
that might exist for a small subset of people wishing to upgrade to
PG13. It seems lightweight enough to maintain, at least until we
invent some better management of how many executor nodes we can have
allocating work_mem at once.

The suggestion I made was just based on asking myself the following
set of questions:

Since Hash Aggregate has been able to overflow work_mem since day 1,
and now that we've completely changed that fact in PG13, is that
likely to upset anyone? If so, should we go to the trouble of giving
those people a way of getting the old behaviour back? If we do want to
help those people, what's the best way to make those options available
to them in a way that we can remove the special options with the least
pain in some future version of PostgreSQL?

I'd certainly be interested in hearing how other people would answer
those question.

David

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Justin Pryzby 2020-06-24 12:38:43 Re: Default setting for enable_hashagg_disk
Previous Message Bruce Momjian 2020-06-24 09:06:28 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-06-24 12:38:43 Re: Default setting for enable_hashagg_disk
Previous Message Thomas Kellerer 2020-06-24 12:23:08 Re: Why forbid "INSERT INTO t () VALUES ();"