Re: PATCH: decreasing memory needlessly consumed by array_agg

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Date: 2014-04-01 17:08:20
Message-ID: 20707.1396372100@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> I've been thinking about it a bit more and maybe the doubling is not
> that bad idea, after all.

It is not. There's a reason why that's our standard behavior.

> The "current" array_agg however violates some of the assumptions
> mentioned above, because it
> (1) pre-allocates quite large number of items (64) at the beginning,
> resulting in ~98% of memory being "wasted" initially
> (2) allocates one memory context per group, with 8kB initial size, so
> you're actually wasting ~99.999% of the memory
> (3) thanks to the dedicated memory contexts, the doubling is pretty
> much pointless up until you cross the 8kB boundary

> IMNSHO these are the issues we really should fix - by lowering the
> initial element count (64->4) and using a single memory context.

The real issue here is that all those decisions are perfectly reasonable
if you expect that a large number of values will get aggregated --- and
even if you don't expect that, they're cheap insurance in simple cases.
It only gets to be a problem if you have a lot of concurrent executions
of array_agg, such as in a grouped-aggregate query. You're essentially
arguing that in the grouped-aggregate case, it's better to optimize on
the assumption that only a very small number of values will get aggregated
(per hash table entry) --- which is possibly reasonable, but the argument
that it's okay to pessimize the behavior for other cases seems pretty
flimsy from here.

Actually, though, the patch as given outright breaks things for both the
grouped and ungrouped cases, because the aggregate no longer releases
memory when it's done. That's going to result in memory bloat not
savings, in any situation where the aggregate is executed repeatedly.

I think a patch that stood a chance of getting committed would need to
detect whether the aggregate was being called in simple or grouped
contexts, and apply different behaviors in the two cases. And you
can't just remove the sub-context without providing some substitute
cleanup mechanism. Possibly you could keep the context but give it
some much-more-miserly allocation parameters in the grouped case.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-04-01 17:36:02 Re: psql \d+ and oid display
Previous Message Alvaro Herrera 2014-04-01 17:04:09 Re: PATCH: decreasing memory needlessly consumed by array_agg