Re: Using ProcSignal to get memory context stats from a running backend

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using ProcSignal to get memory context stats from a running backend
Date: 2017-12-19 13:44:34
Message-ID: CAMsr+YEMxhVDeyEKzUWtaMPrh0DNjQkTpdJsJFWhx3PC7hm6sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 December 2017 at 10:05, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
> > On 15 December 2017 at 09:24, Greg Stark <stark(at)mit(dot)edu> wrote:
> >> Another simpler option would be to open up a new file in the log
> >> directory
> >
> > ... if we have one.
> >
> > We might be logging to syslog, or whatever else.
> >
> > I'd rather keep it simple(ish).
>
> +1. I still think just printing it to the log is fine.
>
>
Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write memory
context dumps to the eventlog with ERROR level if running as a service with
no console. That means we vsnprintf the string into a static buffer first.
I'm not 100% convinced of the wisdom of that because it could flood the
event log, which is usually size limited by # of events and recycled. It'd
be better to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win: at least
Windows users should have some hope of finding out why Pg OOM'd, which
currently they don't when it runs as a service. If we don't, we should look
at using SetStdHandle to write stderr to a secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the
same behaviour instead, but I figured I'd start with reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v1-0001-Add-pg_diag_backend-to-print-memory-context-info.patch text/x-patch 14.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-12-19 13:44:35 Re: [HACKERS] path toward faster partition pruning
Previous Message Christoph Berg 2017-12-19 13:40:12 Re: Package version in PG_VERSION and version()