From: | Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Postmaster fails to shut down right after crash restart |
Date: | 2025-04-24 12:06:19 |
Message-ID: | 63dcad16-22de-4326-a395-5310bc7e05ff@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
While developing a patch and running regression tests I noticed that the
postmaster could fail to shut down right after crash restart. It could
get stuck in the PM_WAIT_BACKENDS state forever.
As far as I understand, the problem occurs when a shutdown signal is
received before getting PMSIGNAL_RECOVERY_STARTED from the startup
process. In that case the FatalError flag is not cleared, and the
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer,
which ignores SIGTERM.
To easily reproduce the problem I added pg_usleep in xlogrecovery.c just
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch
attached.
Then I run a script that simulates a crash and does pg_ctl stop:
$ ./init.sh
[...]
$ ./stop-after-crash.sh
waiting for server to start.... done
server started
waiting for server to shut
down............................................................... failed
pg_ctl: server does not shut down
Some processes are still alive:
$ ps uf -C postgres
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
sergey 279874 0.0 0.0 222816 28560 ? Ss 14:25 0:00
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey 279887 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 0
sergey 279888 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 1
sergey 279889 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_
postgres: io worker 2
sergey 279891 0.0 0.0 222884 8480 ? Ss 14:25 0:00 \_
postgres: checkpointer
Here is an excerpt from the debug log:
postmaster[279874] LOG: all server processes terminated; reinitializing
startup[279890] LOG: database system was interrupted; last known up at
2025-04-24 14:25:58 MSK
startup[279890] LOG: database system was not properly shut down;
automatic recovery in progress
postmaster[279874] DEBUG: postmaster received shutdown request signal
postmaster[279874] LOG: received fast shutdown request
postmaster[279874] DEBUG: updating PMState from PM_STARTUP to
PM_STOP_BACKENDS
postmaster[279874] DEBUG: sending signal 15/SIGTERM to background
writer process with pid 279892
postmaster[279874] DEBUG: sending signal 15/SIGTERM to checkpointer
process with pid 279891
postmaster[279874] DEBUG: sending signal 15/SIGTERM to startup process
with pid 279890
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279889
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279888
postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker
process with pid 279887
postmaster[279874] DEBUG: updating PMState from PM_STOP_BACKENDS to
PM_WAIT_BACKENDS
startup[279890] LOG: invalid record length at 0/175A4D8: expected at
least 24, got 0
postmaster[279874] DEBUG: postmaster received pmsignal signal
startup[279890] LOG: redo is not required
checkpointer[279891] LOG: checkpoint starting: end-of-recovery
immediate wait
checkpointer[279891] LOG: checkpoint complete: wrote 0 buffers (0.0%),
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo
lsn=0/175A4D8
startup[279890] DEBUG: exit(0)
postmaster[279874] DEBUG: updating PMState from PM_WAIT_BACKENDS to
PM_WAIT_BACKENDS
checkpointer[279891] DEBUG: checkpoint skipped because system is idle
checkpointer[279891] DEBUG: checkpoint skipped because system is idle
I don't know how to fix this, but thought it's worth reporting.
Best regards,
--
Sergey Shinderuk https://postgrespro.com/
Attachment | Content-Type | Size |
---|---|---|
delay-recovery-started.patch | text/x-patch | 504 bytes |
init.sh | application/x-shellscript | 135 bytes |
stop-after-crash.sh | application/x-shellscript | 172 bytes |
logfile.gz | application/gzip | 2.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2025-04-24 12:11:52 | pg_upgrade-breaking release |
Previous Message | Peter Eisentraut | 2025-04-24 12:00:25 | Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql) |