Re: (full) Memory context dump considered harmful

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: (full) Memory context dump considered harmful
Date: 2015-08-22 17:16:22
Message-ID: 55D8AE66.7010800@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 08/22/2015 06:06 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>
>> Couldn't we make it a bit smarter to handle even cases like this? For
>> example we might first count/sum the child contexts, and then print
>> either all child contexts (if there are only a few of them) or just
>> those that are >5% of the total, 2x the average or something like that.
>
> That seems overly complicated. In the first place, you couldn't
> possibly implement that without two passes over the context set,
> which would be mighty expensive. (In the situations where this is
> getting run, most likely portions of the address space have been
> swapped out, and we'll have to swap them back in again. Bad enough to
> iterate over the whole process address space once, but twice?) In the
> second place, it would seem to require a much wider API between
> MemoryContextStats and the per-context-type stats methods, including
> for example a way to decide which numbers about a context were the
> most important ones. I'm already concerned about having had to expose
> a set of accumulatable numbers at all ... don't want the API to get
> into their semantics in that kind of detail.

Sure, the two passes seem quite annoying, although I'm not convinced
it'd be a problem in practice - most modern systems I've been dealing
with recently were configured with the 'no swapping' goal, and using
only a tiny swap for extreme cases (e.g. 256GB RAM with 4GB of swap).

If only we had some memory accounting in place ;-) Then we wouldn't have
to do the two passes ...

> As I said, based on my experience in looking at memory context dumps
> (and I've seen a few), a show-the-first-N-children heuristic
> probably will work fine. If we find out differently we can work on
> refining it, but I don't think a complex design is warranted at this
> stage.

OK, let's go with the simple approach.

I'm still not quite convinced having no GUC for turning this off is a
good idea, though. From time to time we're dealing with OOM issues at
customer systems, with very limited access (e.g. no gdb). In those cases
the memory context is the only information we have initial, and it's
usually quite difficult to get additional info.

I agree that the 'first-N-children' heuristics is going to work in most
cases, but if it doesn't it'd be nice to have a way to force "full"
stats in ad-hoc manner (i.e. disable before query, etc.).

OTOH now that I'm thinking about it, most OOM errors I've seen (at least
on Linux) were either because of exceptionally large palloc() requests
(e.g. because of varlena header corruption, overflow bugs, ...) or
because of OOM killer. In the first case the memory context stats are
utterly useless because the issue has nothing to do with other memory
contexts, in the latter case you don't get any stats at all because the
process is simply killed.

One question regarding the proposed patch though - if I get it right
(just looking at the diff), it simply limits the output to first 100
child contexts at each level independently. So if each of those 100
child contexts has >100 child contexts on it's own, we get 100x100
lines. And so on, if the hierarchy is deeper. This probably is not
addressable without introducing some global counter of printed contexts,
and it may not be an issue at all (all the cases I could remember have a
single huge context or many sibling contexts).

> One thing we could consider doing to improve the odds that it's fine
> would be to rearrange things so that child contexts of the same
> parent are more likely to be "similar" --- for example, break out
> all relcache entries to be children of a RelCacheContext instead of
> the generic CacheMemoryContext, likewise for cached plans, etc. But
> I'm not yet convinced that'd be worth the trouble.

That'd be nice but I see that as an independent improvement - it might
improve the odds for internal contexts, but what about contexts coming
from user code (e.g. custom aggregates)?

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-22 17:36:57 Re: (full) Memory context dump considered harmful
Previous Message Tom Lane 2015-08-22 17:02:24 Re: PostgreSQL for VAX on NetBSD/OpenBSD