Postmaster fails to shut down right after crash restart

From: Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Postmaster fails to shut down right after crash restart
Date: 2025-04-24 12:06:19
Message-ID: 63dcad16-22de-4326-a395-5310bc7e05ff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

While developing a patch and running regression tests I noticed that the
postmaster could fail to shut down right after crash restart. It could
get stuck in the PM_WAIT_BACKENDS state forever.

As far as I understand, the problem occurs when a shutdown signal is
received before getting PMSIGNAL_RECOVERY_STARTED from the startup
process. In that case the FatalError flag is not cleared, and the
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer,
which ignores SIGTERM.

To easily reproduce the problem I added pg_usleep in xlogrecovery.c just
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch
attached.

Then I run a script that simulates a crash and does pg_ctl stop:

$ ./init.sh
[...]

$ ./stop-after-crash.sh
waiting for server to start.... done
server started
waiting for server to shut
down............................................................... failed
pg_ctl: server does not shut down

Some processes are still alive:

$ ps uf -C postgres
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
sergey 279874 0.0 0.0 222816 28560 ? Ss 14:25 0:00
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey 279887 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 0
sergey 279888 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 1
sergey 279889 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 2
sergey 279891 0.0 0.0 222884 8480 ? Ss 14:25 0:00 \_
postgres: checkpointer

Here is an excerpt from the debug log:

postmaster[279874] LOG: all server processes terminated; reinitializing
startup[279890] LOG: database system was interrupted; last known up at
2025-04-24 14:25:58 MSK
startup[279890] LOG: database system was not properly shut down;
automatic recovery in progress

postmaster[279874] DEBUG: postmaster received shutdown request signal
postmaster[279874] LOG: received fast shutdown request
postmaster[279874] DEBUG: updating PMState from PM_STARTUP to
PM_STOP_BACKENDS
postmaster[279874] DEBUG: sending signal 15/SIGTERM to background
writer process with pid 279892
postmaster[279874] DEBUG: sending signal 15/SIGTERM to checkpointer
process with pid 279891
postmaster[279874] DEBUG: sending signal 15/SIGTERM to startup process
with pid 279890
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279889
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279888
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279887
postmaster[279874] DEBUG: updating PMState from PM_STOP_BACKENDS to
PM_WAIT_BACKENDS

startup[279890] LOG: invalid record length at 0/175A4D8: expected at
least 24, got 0
postmaster[279874] DEBUG: postmaster received pmsignal signal
startup[279890] LOG: redo is not required

checkpointer[279891] LOG: checkpoint starting: end-of-recovery
immediate wait
checkpointer[279891] LOG: checkpoint complete: wrote 0 buffers (0.0%),
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo
lsn=0/175A4D8

startup[279890] DEBUG: exit(0)
postmaster[279874] DEBUG: updating PMState from PM_WAIT_BACKENDS to
PM_WAIT_BACKENDS

checkpointer[279891] DEBUG: checkpoint skipped because system is idle
checkpointer[279891] DEBUG: checkpoint skipped because system is idle

I don't know how to fix this, but thought it's worth reporting.

Best regards,

--
Sergey Shinderuk https://postgrespro.com/

Attachment Content-Type Size
delay-recovery-started.patch text/x-patch 504 bytes
init.sh application/x-shellscript 135 bytes
stop-after-crash.sh application/x-shellscript 172 bytes
logfile.gz application/gzip 2.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-04-24 12:11:52 pg_upgrade-breaking release
Previous Message Peter Eisentraut 2025-04-24 12:00:25 Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)