Quick Links

Re: PATCH: decreasing memory needlessly consumed by array_agg

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: PATCH: decreasing memory needlessly consumed by array_agg
Date:	2015-01-28 20:56:00
Message-ID:	54C94CE0.7020007@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 28.1.2015 21:28, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Uh, sorry, I've not been paying any attention to this thread for awhile.
>>> What's the remaining questions at issue?
>
>> This patch is trying to improve the array_agg case where there are
>> many arrays being constructed simultaneously, such as in HashAgg. You
>> strongly suggested that a commitable patch would differentiate between
>> the grouped and ungrouped cases (or perhaps you meant differentiate
>> between HashAgg and sorted aggregation?). Tomas's current patch does
>> not do so; it does two main things:
>> 1. Uses a common memory context for arrays being constructed by
>> array_agg in any context (ungrouped, sorted agg, and HashAgg)
>> 2. Reduces the initial array allocation to 8 elements from 64, but
>> preserves doubling behavior
>
> Sorry for slow response on this. I took a look at the v8 patch
> (that's still latest no?). I see that it only gets rid of the

Yes, v8 is the latest version I submitted.

> separate context for the two callers array_agg_transfn and
> array_agg_array_transfn, for which no memory release can happen
> anyway until the parent context is reset (cf comments in
> corresponding finalfns). So that's fine --- it is not making any
> bloat case any worse, at least.
>
> I'm not sure about whether reducing the initial Datum array size
> across-the-board is a good thing or not. One obvious hack to avoid
> unexpected side effects on other callers would be
>
> astate->alen = (subcontext ? 64 : 8);
>
> The larger initial size is basically free when subcontext is true,
> anyway, considering it will be coming out of an 8K initial subcontext
> allocation.

Seems like a good idea. If we're using 8kB contexts, we can preallocate
64 elements right away.

But maybe we could decrease the 8kB context size to 1kB? For 64 elements
is just 512B + 64B for the arrays, and a bit of space for the
ArrayBuildState. So maybe ~600B in total.

Of course, this does not include space for pass-by-ref values.

Anyway, I have no problem with this - I mostly care just about the
array_agg() case. All the other places may adopt the 'common context'
approach to get the benefit.

> This still leaves us wondering whether the smaller initial size will
> hurt array_agg itself, but that's at least capable of being tested
> with a reasonably small number of test cases.

I plan to do more testing to confirm this - my initial testing seemed to
confirm this, but I'll repeat that with the current patch.

> I strongly object to removing initArrayResultArr's element_type
> argument. That's got nothing to do with the stated purpose of the
> patch, and it forces a non-too-cheap catalog lookup to be done
> inside initArrayResultArr. It's true that some of the current call
> sites are just doing the same lookup themselves anyway, but we should
> not foreclose the possibility that the caller has the data already
> (as some others do) or is able to cache it across multiple uses. A
> possible compromise is to add the convention that callers can pass
> InvalidOid if they want initArrayResultArr to do the lookup. (In any
> case, the function's header comment needs adjustment if the API spec
> changes.)

Fair point, and the InvalidOid approach seems sane to me.

>
> Other than that, I think the API changes here are OK. Adding a new
> argument to existing functions is something we do all the time, and
> it's clear how to modify any callers to get the same behavior as
> before. We could perhaps clean things up with a more invasive
> redefinition, but I doubt it's worth inflicting more pain on third
> party callers.
>
> Another thing I'd change is this:
>
> + /* we can only release the context if it's a private one. */
> + Assert(! (release && !astate->private_cxt));
> +
> /* Clean up all the junk */
> if (release)
> MemoryContextDelete(astate->mcontext);
>
> Why not this way:
>
> /* Clean up all the junk */
> if (release)
> + {
> + Assert(astate->private_cxt);
> MemoryContextDelete(astate->mcontext);
> + }
>
> Seems a lot more understandable, and less code too.

Yeah, I agree it's easier to understand.

> I concur with the concerns that the comments could do with more
> work, but haven't attempted to improve them myself.

There were a few comments about this, after the v8 patch, with
recommended comment changes.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: PATCH: decreasing memory needlessly consumed by array_agg at 2015-01-28 20:28:54 from Tom Lane

Responses

Re: PATCH: decreasing memory needlessly consumed by array_agg at 2015-01-28 22:25:53 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2015-01-28 21:10:42	Re: Make hba available to client code
Previous Message	David Fetter	2015-01-28 20:49:31	Make hba available to client code