Re: [GENERAL] postmaster deadlock while logging after syslogger exited

From: David Pacheco <dap(at)joyent(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] postmaster deadlock while logging after syslogger exited
Date: 2017-11-20 19:12:08
Message-ID: CACukRjO1Cq3FSNznwn_896xZrGXK1wHO6J75h64e7zBw+OxRVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks again for looking at this issue.

On Mon, Nov 20, 2017 at 10:34 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-11-20 10:13:57 -0800, David Pacheco wrote:
> > I expect what happened is that the syslogger process attempted to
> allocate
> > memory, failed because the system was low, and explicitly exited. That's
> > consistent with an exited process, no core file generated, and the
> "FATAL"
> > "out
> > of memory" entries in the log I posted.
>
> Well unfortunately we're presumably not going to see output from
> syslogger's death itself, chicken and egg etc.
>

If that's true, I think that's a separate bug. If a process receives
ENOMEM from an attempt to allocate memory, it can still write out a log
message to the already-open file descriptor indicating that it's exiting
because of that failure. For all I know, that's what happened here, as
there are FATAL "out of memory" entries in the log. (I'd be able to tell
for sure if we had configured logging to include pids in the log messages.)

> Tom Lane wrote:
> > > Hm, so that's another angle David didn't report on: is it possible that
> > > his workload could have resulted in a very large volume of incomplete
> > > in-progress log messages?
> >
> > Yes. I mentioned in my Nov 6 mail that large log messages over a
> > short period appear to have been a major contributing factor:
>
> So it's actually quite possible that syslogger did use a fair amount of
> memory. Could you show its memory usage, just via ps?
>

I don't have information from the problematic case because the process had
already exited. Right now, for the same database (actually a different
instance, because this incident triggered a failover):

$ ps -opid,rss,vsz,args -p 37627
PID RSS VSZ COMMAND
37627 2980 14968 /opt/postgresql/9.2.4/bin/postgres -D /manatee/pg/data

I'm not sure what we can infer from that, as this is a different system,
and the workload that generates the very large query strings only runs
occasionally. Those strings are also not logged unless something's gone
wrong.

> This is a critical point. As far as I can tell, all that's necessary
> > for this deadlock to occur is:
> >
> > - the syslogger is unable to make forward progress (e.g., because it
> > segfaulted)
>
> This specific case I don't care that much about, but ...

I understand if the community isn't interested in fixing this case if other
users aren't seeing it much, but surely it's still a bug that this unusual
case can result in a deadlock?

-- Dave

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2017-11-20 19:17:33 Re: [GENERAL] postmaster deadlock while logging after syslogger exited
Previous Message Jaimi McEntire 2017-11-20 19:11:27 RE: [MESSAGE PUBLICITAIRE- VIGILANCE] RE: Unsubscribe