Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2014-12-22 01:08:24
Message-ID: CACQjQLq523T8NDG0fT+inxjOLGTKidJRsF78+kxUiBCQt8rFxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Another positive benefit is that this won't break the code unless it
> uses the new API. This is a problem especially with external code (e.g.
> extensions), but the new API (initArray*) is not part of 9.4 so there's
> no such code. So that's nice.
>
> The one annoying thing is that this makes the API slighly unbalanced.
> With the new API you can use a shared memory context, which with the old
> one (not using the initArray* methods) you can't.
>
> But I'm OK with that, and it makes the patch smaller (15kB -> 11kB).

Yes, with this API, we can backpatch this patch to 9.4 (or below) if we
need it there.

I think this API is a good compromise of old API and new API. Ideally if we
can migrate all code to new API (all code must call initArrayResult* before
accumArrayResult*), we can remove parameter MemoryContext rcontext from
accumArrayResult. Currently, the code isn't using the rcontext for anything
except for old API calls (in first call to accumArrayResult).

2014-12-21 20:38 GMT+07:00 Tomas Vondra <tv(at)fuzzy(dot)cz>:

> On 21.12.2014 02:54, Alvaro Herrera wrote:
> > Tomas Vondra wrote:
> >> Attached is v5 of the patch, fixing an error with releasing a shared
> >> memory context (invalid flag values in a few calls).
> >
> > The functions that gain a new argument should get their comment updated,
> > to explain what the new argument is for.
>
> Right. I've added a short description of the 'subcontext' parameter to
> all three variations of the initArray* function, and a more thorough
> explanation to initArrayResult().
>

With this API, i think we should make it clear if we call initArrayResult
with subcontext=false, we can't call makeArrayResult, but we must use
makeMdArrayResult directly.

Or better, we can modify makeArrayResult to release according to
astate->private_cxt:

> @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
> dims[0] = astate->nelems;
> lbs[0] = 1;
>
> - return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
> + return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
> astate->private_cxt);
>

Or else we implement what you suggest below (more comments below):

Thinking about the 'release' flag a bit more - maybe we could do this
> instead:
>
> if (release && astate->private_cxt)
> MemoryContextDelete(astate->mcontext);
> else if (release)
> {
> pfree(astate->dvalues);
> pfree(astate->dnulls);
> pfree(astate);
> }
>
> i.e. either destroy the whole context if possible, and just free the
> memory when using a shared memory context. But I'm afraid this would
> penalize the shared memory context, because that's intended for cases
> where all the build states coexist in parallel and then at some point
> are all converted into a result and thrown away. Adding pfree() calls is
> no improvement here, and just wastes cycles.
>

As per Tom's comment, i'm using "parent memory context" instead of "shared
memory context" below.

In the future, if some code writer decided to use subcontext=false, to save
memory in cases where there are many array accumulation, and the parent
memory context is long-living, current code can cause memory leak. So i
think we should implement your suggestion (pfreeing astate), and warn the
implication in the API comment. The API user must choose between
release=true, wasting cycles but preventing memory leak, or managing memory
from the parent memory context.

In one possible use case, for efficiency maybe the caller will create a
special parent memory context for all array accumulation. Then uses
makeArrayResult* with release=false, and in the end releasing the parent
memory context once for all.

Regards,
--
Ali Akbar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-22 01:15:13 Re: btree_gin and ranges
Previous Message Michael Paquier 2014-12-22 00:56:35 Re: parallel mode and parallel contexts