Re: Default setting for enable_hashagg_disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-26 23:58:50
Message-ID: 20200626235850.gvl3lpfyeobu4evi@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Jun 26, 2020 at 07:00:20PM -0400, Bruce Momjian wrote:
>On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote:
>> On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote:
>> > I was thinking more of being able to allocate a single value to be
>> > shared by all active sesions.
>>
>> Not sure I understand. What "single value" do you mean?
>
>I was thinking of a full-cluster work_mem maximum allocation that could
>be given to various backends that request it.
>
>Imagine we set the cluster-wide total of work_mem to 1GB. If a session
>asks for 100MB, if there are no other active sessions, it can grant the
>entire 100MB. If there are other sessions running, and 500MB has
>already been allocated, maybe it is only given an active per-node
>work_mem of 50MB. As the amount of unallocated cluster-wide work_mem
>gets smaller, requests are granted smaller actual allocations.
>
>What we do now makes little sense, because we might have lots of free
>memory, but we force nodes to spill to disk when they exceed a fixed
>work_mem. I realize this is very imprecise, because you don't know what
>future work_mem requests are coming, or how long until existing
>allocations are freed, but it seems it would have to be better than what
>we do now.
>
>> Wasn't the idea was to replace work_mem with something like query_mem?
>> That'd be nice, but I think it's inherently circular - we don't know how
>> to distribute this to different nodes until we know which nodes will
>> need a buffer, but the buffer size is important for costing (so we need
>> it when constructing the paths).
>>
>> Plus then there's the question whether all nodes should get the same
>> fraction, or less sensitive nodes should get smaller chunks, etc.
>> Ultimately this would be based on costing too, I think, but it makes it
>> soe much complex ...
>
>Since work_mem affect the optimizer choices, I can imagine it getting
>complex since nodes would have to ask the global work_mem allocator how
>much memory it _might_ get, but then ask for final work_mem during
>execution, and they might differ. Still, our spill costs are so high
>for so many node types, that reducing spills seems like it would be a
>win, even if it sometimes causes poorer plans.
>

I may not understand what you mean by "poorer plans" here, but I find it
hard to accept that reducing spills is generally worth poorer plans.

I agree larger work_mem for hashagg (and thus less spilling) may mean
lower work_mem for so some other nodes that are less sensitive to this.
But I think this needs to be formulated as a cost-based decision,
although I don't know how to do that for the reasons I explained before
(bottom-up plan construction vs. distributing the memory budget).

FWIW some databases already do something like this - SQL Server has
something called "memory grant" which I think mostly does what you
described here.

>> > Also, doesn't this blog entry also show that spiling to disk for ORDER
>> > BY is similarly slow compared to hash aggs?
>> >
>> > https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012
>>
>> The post does not mention hashagg at all, so I'm not sure how could it
>> show that? But I think you're right the spilling itself is not that far
>> away, in most cases (thanks to the recent fixes made by Jeff).
>
>Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
>overhead to hashagg spill, which is being singled out in this discussion
>as particularly expensive, and I am questioning that.
>

The difference between sort and hashagg spills is that for sorts there
is no behavior change. Plans that did (not) spill in v12 will behave the
same way on v13, modulo some random perturbation. For hashagg that's not
the case - some queries that did not spill before will spill now.

So even if the hashagg spills are roughly equal to sort spills, both are
significantly more expensive than not spilling.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2020-06-27 00:05:16 Re: Default setting for enable_hashagg_disk
Previous Message Bruce Momjian 2020-06-26 23:56:22 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-06-27 00:05:16 Re: Default setting for enable_hashagg_disk
Previous Message Bruce Momjian 2020-06-26 23:56:22 Re: Default setting for enable_hashagg_disk