I ran into an issue with restarting a box after power was cycled with 72
instances of PostgreSQL running. The problem appeared to be that the old
pid files, which PostgreSQL never got a chance to clean up, were causing
problems when the instances were started or stopped, due to reuse of the
pids by instances at different locations. I posted on the hackers list
because I thought that there might be a way for PostgreSQL to detect this
and prevent the problems. That is not feasible, but people offered three
suggestions for how to avoid running into the problem. I'm posting a
summary of those solutions here on the admin list, to aid those who may
hit the issue and who may not be searching the developer archives for a
solution.
The original thread is here:
http://archives.postgresql.org/pgsql-hackers/2007-08/msg01194.php
The three solutions suggested were:
(1) Run each instance under a different OS user.
(2) Set up a script to clean up old pid files when the OS is booted.
(3) Have a script which isn't normally used for managing the instances,
but only to clean up the pid files and start the instances after an abrupt
failure like a power loss. (For example, we might do something like turn
off our existing PostgreSQL instance services under chkconfig and have a
"start-all" script to delete pid files and start the services.)
Any of the above should keep you out of trouble on a machine with multiple
PostgreSQL intances. I do recommend that people in such a situation set
themselves up with one of these techniques before they hit a power failure
or hardware failure that might expose the issue, since it can be a little
confusing in the midst of recovery. (Hmmm... Maybe I'll try to find a
good place to put this information in the docs, and submit a doc patch....)
-Kevin