From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: Spilling hashed SetOps and aggregates to disk |
Date: | 2018-06-05 13:25:07 |
Message-ID: | 8f16eb97-09b0-e96b-ff78-d9677677909d@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/05/2018 03:17 PM, David Rowley wrote:
> On 6 June 2018 at 01:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2018-06-06 01:06:39 +1200, David Rowley wrote:
>>> My concern is that only accounting memory for the group and not the
>>> state is only solving half the problem. It might be fine for
>>> aggregates that don't stray far from their aggtransspace, but for the
>>> other ones, we could still see OOM.
>>
>>> If solving the problem completely is too hard, then a half fix (maybe
>>> 3/4) is better than nothing, but if we can get a design for a full fix
>>> before too much work is done, then isn't that better?
>>
>> I don't think we actually disagree. I was really primarily talking
>> about the case where we can't really do better because we don't have
>> serialization support. I mean we could just rescan from scratch, using
>> a groupagg, but that obviously sucks.
>
> I don't think we do. To take yours to the 100% solution might just
> take adding the memory accounting to palloc that Jeff proposed a few
> years ago and use that accounting to decide when we should switch
> method.
>
> However, I don't quite fully recall how the patch accounted for
> memory consumed by sub-contexts and if getting the entire
> consumption required recursively looking at subcontexts. If that's
> the case then checking the consumption would likely cost too much if
> it was done after each tuple was aggregated.
Yeah, a simple wrapper would not really fix the issue, because
allocating memory in the subcontext would not update the accounting
information. So we'd be oblivious of possibly large amounts of memory. I
don't quite see how is this related to (not) having serialization
support, as it's more about not knowing we already hit the memory limit.
That being said, I don't think array_agg actually has the issue, at
least since b419865a814abbca12bdd6eef6a3d5ed67f432e1 (it does behave
differently in aggregate and non-aggregate contexts, IIRC).
I don't know how common this issue is, though. I don't think any
built-in aggregates create additional contexts, but I may be wrong. But
we have this damn extensibility thing, where users can write their own
aggregates ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-06-05 13:38:00 | install <install_path> doesn't work on HEAD |
Previous Message | David Rowley | 2018-06-05 13:17:55 | Re: Spilling hashed SetOps and aggregates to disk |