Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)

From: Vladimir Rusinov <vrusinov(at)google(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Serge Rielau <serge(at)rielau(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Treating work_mem as a shared resource (Was: Parallel Hash take II)
Date: 2017-11-17 16:09:12
Message-ID: CAE1wr-ykMDUFMjucDGqU-s98ARk3oiCfhxrHkajnb3f=Up70JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 17, 2017 at 3:31 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Nov 16, 2017 at 11:50 AM, Serge Rielau <serge(at)rielau(dot)com> wrote:
>
>> Just as you have, we have also considered holistic provisioning of
>> work_mem across all consumers, but we find that to be too complex.
>> Having an “emergency fund” in shared memory is also an option, but I find
>> it too limiting.
>>
>
> I agree.
>
> I think this is basically a planning problem. For example, say we wanted
> to have work_mem_per_query instead of work_mem_per_node. There is an
> obvious design: consider memory use as an independent dimension of merit
> during path generation and comparison (less is better). Discard candidate
> paths whose memory use exceeds the work_mem_per_query budget unless there
> are no other alternatives. At the end of planning, pick the cheapest path
> that survived the memory-budget filter. Now, this has the problem that it
> would make planning more expensive (because we'd hang on to more paths for
> longer) but it solves a lot of other problems. If there's no memory
> pressure, we can use memory like mad even when it doesn't save much, but
> when we have to pick between using more memory for one part of the plan and
> using more memory for another part of the plan, the choice that does the
> best job reducing overall execution time will win. Awesome.
>
> We could also do more localized variants of this that don't provide hard
> guarantees but do tend to avoid squandering resources. I don't think that
> we can directly incorporate memory use into cost because that will distort
> the costs of higher-level nodes in the plan tree; cost needs to mean
> execution time. However, what we could do is refuse to replace a more
> expensive path in a relation's path list with a cheaper one when the
> savings are small and the cheaper path uses a lot more memory. That way,
> you wouldn't replace a nested loop that costs a million units with a hash
> join that costs 999,999 units but uses a GB of RAM; you'd save the hash
> join for cases where we think it will help significantly.
>
> Yet another thing we could do is to try to get nodes to voluntarily use
> less than work_mem when possible. This is particularly an issue for
> sorts. A 2-batch hash join is so much more expensive than a single-batch
> hash join that it's almost never going to make sense unless we have no
> realistic alternative, although I suppose a 64-batch hash join might be not
> that different from a 32-batch hash join. But for sorts, given all Peter's
> work in this area, I bet there are a lot of sorts that could budget a
> quarter or less of work_mem and really not be hurt very much. It depends
> somewhat on how fast and how contended your I/O is, though, which we don't
> have an especially good way to model. I'm starting to wonder if that
> sort_mem GUC might be a good idea... use that for sorts, and keep work_mem
> for everything else.
>
> If we really want to be able to dynamically react to change memory
> conditions, what we need is a forest of plans for a given query rather than
> just one. Pick plan A if memory is limited, otherwise pick B. Or use
> admission control.
>

FWIW, lack of per-connection and/or global memory limit for work_mem is
major PITA when running shared and/or large-scale setup.

Currently we are doing a poor job with the work_mem parameter because we
don't have a good way to let our customers increase it without also giving
them ability to shoot themselves in a foot.
Even a simple param limiting global total number of work_mem buffers would
help here.

--
Vladimir Rusinov
PostgreSQL SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-11-17 16:27:49 Re: PG10.1 autovac killed building extended stats
Previous Message Andres Freund 2017-11-17 16:05:28 Re: Inlining functions with "expensive" parameters