Re: postmaster deadlock while logging after syslogger exited

From: David Pacheco <dap(at)joyent(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postmaster deadlock while logging after syslogger exited
Date: 2017-11-06 21:46:30
Message-ID: CACukRjMW2PJ=Lvk1+NOU3Jxgrwe_MB+=X7_+xT0-UZ=OTh_GZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 6, 2017 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Pacheco <dap(at)joyent(dot)com> writes:
> > ... that process appears to have exited due to a fatal error
> > (out of memory). (I know it exited because the process still exists in
> the
> > kernel -- it hasn't been reaped yet -- and I think it ran out of memory
> > based on a log message I found from around the time when the process
> > exited.)
>
> Could we see the exact log message(s) involved? It's pretty hard to
> believe that the logger would have consumed much memory.

Thanks for the quick reply!

Based on kernel state about the dead but unreaped syslogger process, I
believe the process exited at 2017-10-27T23:46:21.258Z. Here are all of
the entries in the PostgreSQL log from 23:19:12 until the top of the next
hour:
https://gist.githubusercontent.com/davepacheco/c5541bb464532075f2da761dd990a457/raw/2ba242055aca2fb374e9118045a830d08c590e0a/gistfile1.txt

There's no log entry at exactly 23:46:21 or even immediately before that,
but there are a lot of "out of memory" errors and a FATAL one at 23:47:28.
Unfortunately, we haven't configured logging to include the pid, so I can't
be sure which messages came from the syslogger.

There are also many log entries for some very long SQL queries. I'm sure
that contributed to this problem by filling up the pipe. I was able to
extract the contents of the pipe while the system was hung, and it was more
of these giant query strings.

I think it's likely that this database instance was running in a container
with way too small a memory cap for the number of processes configured.
(This was a zone (a lightweight container) allocated with 2GB of memory and
configured with 512MB of shared_buffers and up to 200 connections.) I
expect that the system got to a persistent state of having basically no
memory available, at which point nearly any attempt to allocate memory
could fail. The syslogger itself may not have been using much memory.

So I'm not so much worried about the memory usage itself, but it would be
nice if this condition were handled better. Handling out-of-memory is
obviously hard, especially when it means being unable to fork, but even
crashing would have been better for our use-case. And of course, there are
other reasons that the syslogger could exit prematurely besides being low
on memory, and those might be more recoverable.

Thanks,
Dave

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2017-11-06 21:53:53 Re: idle in transaction, why
Previous Message Merlin Moncure 2017-11-06 21:38:57 Re: idle in transaction, why