Quick Links

Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array

From:	Daniel Hahler <postgresql(at)thequod(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array
Date:	2014-03-25 15:17:52
Message-ID:	53319E20.9030006@thequod.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 25.03.2014 15:36, Alvaro Herrera wrote:
> Tom Lane wrote:
>> postgresql(at)thequod(dot)de writes:
>>> PostgreSQL just failed to startup after a reboot (which was forced via
>>> remote Ctrl-Alt-Delete on the PostgreSQL's containers host):
>>
>>> 2014-03-24 13:32:47 CET LOG: could not receive data from client: Connection
>>> reset by peer
>>> 2014-03-25 12:32:17 CET FATAL: no free slots in PMChildFlags array
>>> 2014-03-25 12:32:17 CET LOG: process 9975 releasing ProcSignal slot 108,
>>> but it contains 0
>>> 2014-03-25 12:32:17 CET LOG: process 9974 releasing ProcSignal slot 109,
>>> but it contains 0
>>> 2014-03-25 12:32:17 CET LOG: process 9976 releasing ProcSignal slot 110,
>>> but it contains 0
>>
>> That's odd (and as you say, unexpected) but this log extract doesn't give
>> much clue as to how we got into this state. What was going on before
>> this? In particular, it's hard to call this "failure to start up" when
>> you evidently had a hundred or so postmaster child processes already.
>> Could there have been some unexpected surge in the number of connection
>> attempts just after the database came up? Also, this extract doesn't look
>> like anything that would've caused the postmaster to decide to shut down
>> again, so what happened after that? Or in short, I want to see the rest
>> of the log not just this part.

That was the whole log.

The rotated one before has only:
2014-03-22 03:51:37 CET LOG: could not receive data from client: Connection reset by peer
2014-03-22 03:52:25 CET LOG: could not receive data from client: Connection reset by peer
2014-03-22 03:59:31 CET LOG: could not receive data from client: Connection reset by peer
2014-03-22 04:00:18 CET LOG: could not receive data from client: Connection reset by peer
2014-03-22 06:03:06 CET LOG: could not receive data from client: Connection reset by peer

Should I increase the logging verbosity, in case this happens again?
If so, to what? (I have not configured logging yet, so it has the defaults from your Debian package).

> Here's my guess --- this is a virtualized system that somehow dumped
> some state to disk to hibernate while the host was being rebooted; and
> then, when the host was up again, it tried to resurrect the virtual
> machine and found things to be all inconsistent.

Yes, the container was frozen during reboot:

From the host:
Mar 25 11:54:48 HN kernel: [ 76.237452] CT: 144: started
Mar 25 11:55:03 HN kernel: [ 91.201145] CT: 144: restored

OpenVZ uses "suspend" by default to stop containers on host reboots.
I will change this to "stop" for the PostgreSQL container, but still this seems like something PostgreSQL should handle better.

FWIW, I have just suspended and started the container manually, and PostgreSQL kept running (upgraded to 9.3.4 in the meantime).

Maybe it's a bug with OpenVZ and how it restores some resources after rebooting the host?

Please also note that the PostgreSQL error happened half an hour after the reboot/resuming of the container.

Thanks,
Daniel.

--
http://daniel.hahler.de/

In response to

Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array at 2014-03-25 14:36:47 from Alvaro Herrera

Responses

Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array at 2014-03-25 15:26:06 from Alvaro Herrera
Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array at 2014-03-25 16:02:53 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alvaro Herrera	2014-03-25 15:26:06	Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array
Previous Message	Alvaro Herrera	2014-03-25 14:36:47	Re: BUG #9721: Fatal error on startup: no free slots in PMChildFlags array