Re: What to do when dynamic shared memory control segment is corrupt

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sherrylyn Branchaw <sbranchaw(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: What to do when dynamic shared memory control segment is corrupt
Date: 2018-06-18 16:30:13
Message-ID: 28565.1529339413@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sherrylyn Branchaw <sbranchaw(at)gmail(dot)com> writes:
> We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9.
> We recently experienced two similar outages on two different prod
> databases. The error messages from the logs were as follows:
> LOG: server process (PID 138529) was terminated by signal 6: Aborted

Hm ... were these installations built with --enable-cassert? If not,
an abort trap seems pretty odd.

> In one case, the logs recorded
> LOG: all server processes terminated; reinitializing
> LOG: incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

> In the other case, the logs recorded
> LOG: all server processes terminated; reinitializing
> LOG: dynamic shared memory control segment is corrupt
> LOG: incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

Those "incomplete data" messages are quite unexpected and disturbing.
I don't know of any mechanism within Postgres proper that would result
in corruption of the postmaster.pid file that way. (I wondered briefly
if trying to start a conflicting postmaster would result in such a
situation, but experimentation here says not.) I'm suspicious that
this may indicate a bug or unwarranted assumption in whatever scripts
you use to start/stop the postmaster. Whether that is at all related
to your crash issue is hard to say, but it bears looking into.

> My question is whether the corrupt shared memory control segment, and the
> failure of Postgres to automatically restart, mean the database should not
> be automatically started up, and if there's something we should be doing
> before restarting.

No, that looks like fairly typical crash recovery to me: corrupt shared
memory contents are expected and recovered from after a crash. However,
we don't expect postmaster.pid to get mucked with.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-06-18 17:28:43 Re: What to do when dynamic shared memory control segment is corrupt
Previous Message Łukasz Jarych 2018-06-18 15:47:45 Run Stored procedure - function from VBA