From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: Spilling hashed SetOps and aggregates to disk |
Date: | 2018-06-07 00:27:58 |
Message-ID: | CAKJS1f_yPD8M0SPSAqHBvf-SpXQu_CxsYDb-aV3k4mk1nzkFFw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6 June 2018 at 01:17, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 6 June 2018 at 01:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2018-06-06 01:06:39 +1200, David Rowley wrote:
>>> My concern is that only accounting memory for the group and not the
>>> state is only solving half the problem. It might be fine for
>>> aggregates that don't stray far from their aggtransspace, but for the
>>> other ones, we could still see OOM.
>>
>>> If solving the problem completely is too hard, then a half fix (maybe
>>> 3/4) is better than nothing, but if we can get a design for a full fix
>>> before too much work is done, then isn't that better?
>>
>> I don't think we actually disagree. I was really primarily talking
>> about the case where we can't really do better because we don't have
>> serialization support. I mean we could just rescan from scratch, using
>> a groupagg, but that obviously sucks.
>
> I don't think we do. To take yours to the 100% solution might just
> take adding the memory accounting to palloc that Jeff proposed a few
> years ago and use that accounting to decide when we should switch
> method.
>
> However, I don't quite fully recall how the patch accounted for memory
> consumed by sub-contexts and if getting the entire consumption
> required recursively looking at subcontexts. If that's the case then
> checking the consumption would likely cost too much if it was done
> after each tuple was aggregated.
I wonder if the whole internal state memory accounting problem could
be solved by just adding an aggregate supporting function for internal
state aggregates that returns the number of bytes consumed by the
state. It might be good enough to fall back on aggtransspace when the
function is not defined. Such a function would be about 3 lines long
for string_agg and array_agg, and these are the problem aggregates.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2018-06-07 00:42:41 | Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced |
Previous Message | Andres Freund | 2018-06-07 00:18:54 | Re: Spilling hashed SetOps and aggregates to disk |