Re: [GENERAL] postmaster deadlock while logging after syslogger exited

From: David Pacheco <dap(at)joyent(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] postmaster deadlock while logging after syslogger exited
Date: 2017-12-04 22:55:02
Message-ID: CACukRjMrSo4bJeOenrKhO4ixLOzBLADCie+uzxW4VV8-8eyh1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks again for helping out.

On Mon, Dec 4, 2017 at 2:12 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2017-12-04 13:57:52 -0800, David Pacheco wrote:
> > On Mon, Dec 4, 2017 at 12:23 PM, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > > FWIW, I'd like to see a report of this around the time the issue
> > > occurred before doing anything further here.
> > >
> >
> >
> > This failure begins when this process exits, so the best you could get is
> > memory in use immediately before it exited. I obviously can't get that
> now
> > for the one instance I saw weeks ago, but maybe PostgreSQL could log
> > information about current memory usage when it's about to exit because of
> > ENOMEM?
>
> It already does so.
>

In that case, do you have the information you need in the log that I posted
earlier in the thread?
(
https://gist.githubusercontent.com/davepacheco/c5541bb464532075f2da761dd990a457/raw/2ba242055aca2fb374e9118045a830d08c590e0a/gistfile1.txt
)

What I was wondering about was the memory usage some time before it
> dies. In particular while the workload with the long query strings is
> running. ps output would be good, gdb'ing into the process and issuing
> MemoryContextStats(TopMemoryContext) would be better.
>

Would it make sense for PostgreSQL to periodically sample the memory used
by the current process, keep a small ringbuffer of recent samples, and then
log all of that when it exits due to ENOMEM?

One does not know that one is going to run into this problem before it
happens, and it may not happen very often. (I've only seen it once.) The
more PostgreSQL can keep the information needed to understand something
like this after the fact, the better -- particularly since the overhead
required to maintain this information should not be that substantial.

> That way if anybody hits a similar condition in the future, the
> > data will be available to answer your question.
> >
> > That said, I think the deadlock itself is pretty well explained by the
> data
> > we have already.
>
> Well, it doesn't really explain the root cause, and thus the extent of
> the fixes required. If the root cause is the amount of memory used by
> syslogger, we can remove the deadlock, but the experience is still going
> to be bad. Obviously better, but still bad.
>

Fair enough. But we only know about one problem for sure, which is the
deadlock. There may be a second problem of the syslogger using too much
memory, but I don't think there's any evidence to point in that direction.
Once the whole system is out of memory (and it clearly was, based on the
log entries), anything that tried to allocate would fail, and the log
reflects that a number of different processes did fail to allocate memory.
I'd help investigate this question, but I have no more data about it, and
I'm not sure when I will run into this again.

Thanks,
Dave

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2017-12-05 01:52:33 Re: transaction wrap around
Previous Message chris kim 2017-12-04 22:21:47 transaction wrap around