Quick Links

Re: PATCH: decreasing memory needlessly consumed by array_agg

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: PATCH: decreasing memory needlessly consumed by array_agg
Date:	2015-01-20 22:37:20
Message-ID:	54BED8A0.9070901@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 20.1.2015 21:13, Jeff Davis wrote:
> On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>>> Tom (tgl),
>>> Is my reasoning above acceptable?
>>
>> Uh, sorry, I've not been paying any attention to this thread for awhile.
>> What's the remaining questions at issue?
>
> This patch is trying to improve the array_agg case where there are
> many arrays being constructed simultaneously, such as in HashAgg.
> You strongly suggested that a commitable patch would differentiate
> between the grouped and ungrouped cases (or perhaps you meant
> differentiate between HashAgg and sorted aggregation?). Tomas's
> current patch does not do so; it does two main things:

I don't think that's entirely true. The problem with the initial
(experimental) patch was that while it fixed aggregate queries, it
mostly ignored all the other callers, and either resulted in memory
corruption (unexpected pfree) or bloat (when not doint the pfree).

Tom's message where he points that out is here:
http://www.postgresql.org/message-id/20707.1396372100@sss.pgh.pa.us

The current patch does that distinction properly (IMHO), because it does
this distinction - all the callers using the underlying array functions
will use the original approach (with subcontexts), and only the
array_agg uses the new API and forces subcontext=false.

> 1. Uses a common memory context for arrays being constructed by
> array_agg in any context (ungrouped, sorted agg, and HashAgg)
> 2. Reduces the initial array allocation to 8 elements from 64, but
> preserves doubling behavior

Yes, that's true. I'd like to point out that while the current code uses
64 items, it also uses 8kB per-grop contexts. That's slightly overkill I
guess ...

> I don't see either as a big problem, but perhaps there are some
> downsides in some cases. I think a worst case would be a slowdown in
> the sorted agg case where every group has 64 elements, so I'll try
> to see if that's a real problem or not. If you saw a bigger problem,
> please let me know; and if not, I'll proceed with the review.

FWIW I've done a fair amount of measurements and not noticed any
measurable difference (unless using a rather crazy testcase, IIRC).
Certainly the issues with excessive memory consumption (and swapping)
were much worse.

> There are also some other concerns I have about the API ugliness,
> which I believe is the reason there's so much discussion about making
> the comments better. The reason the API is ugly is for backwards
> compatibility, so there's no perfect solution. Your opinion is welcome
> here too, but I mainly need to see if your objection above has been
> addressed.

I generally agree that having two API 'facets' with different behavior
is slightly awkward and assymetric, but I wouldn't call that ugly. I
actually modified both APIs initially, but I think Ali is right that not
breaking the existing API (and keeping the original behavior in that
case) is better. We can break it any time we want in the future, but
it's impossible to "unbreak it" ;-)

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: PATCH: decreasing memory needlessly consumed by array_agg at 2015-01-20 20:13:32 from Jeff Davis

Responses

Re: PATCH: decreasing memory needlessly consumed by array_agg at 2015-01-21 08:01:51 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Gierth	2015-01-20 23:27:43	Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message	Andrew Dunstan	2015-01-20 22:30:53	Re: Add min and max execute statement time in pg_stat_statement