| From: | Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru> | 
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Postmaster fails to shut down right after crash restart | 
| Date: | 2025-04-24 12:06:19 | 
| Message-ID: | 63dcad16-22de-4326-a395-5310bc7e05ff@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello,
While developing a patch and running regression tests I noticed that the 
postmaster could fail to shut down right after crash restart. It could 
get stuck in the PM_WAIT_BACKENDS state forever.
As far as I understand, the problem occurs when a shutdown signal is 
received before getting PMSIGNAL_RECOVERY_STARTED from the startup 
process. In that case the FatalError flag is not cleared, and the 
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer, 
which ignores SIGTERM.
To easily reproduce the problem I added pg_usleep in xlogrecovery.c just 
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch 
attached.
Then I run a script that simulates a crash and does pg_ctl stop:
$ ./init.sh
[...]
$ ./stop-after-crash.sh
waiting for server to start.... done
server started
waiting for server to shut 
down............................................................... failed
pg_ctl: server does not shut down
Some processes are still alive:
$ ps uf -C postgres
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
sergey    279874  0.0  0.0 222816 28560 ?        Ss   14:25   0:00 
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey    279887  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 0
sergey    279888  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 1
sergey    279889  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 2
sergey    279891  0.0  0.0 222884  8480 ?        Ss   14:25   0:00  \_ 
postgres: checkpointer
Here is an excerpt from the debug log:
postmaster[279874] LOG:  all server processes terminated; reinitializing
startup[279890] LOG:  database system was interrupted; last known up at 
2025-04-24 14:25:58 MSK
startup[279890] LOG:  database system was not properly shut down; 
automatic recovery in progress
postmaster[279874] DEBUG:  postmaster received shutdown request signal
postmaster[279874] LOG:  received fast shutdown request
postmaster[279874] DEBUG:  updating PMState from PM_STARTUP to 
PM_STOP_BACKENDS
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to background 
writer process with pid 279892
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to checkpointer 
process with pid 279891
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to startup process 
with pid 279890
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279889
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279888
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279887
postmaster[279874] DEBUG:  updating PMState from PM_STOP_BACKENDS to 
PM_WAIT_BACKENDS
startup[279890] LOG:  invalid record length at 0/175A4D8: expected at 
least 24, got 0
postmaster[279874] DEBUG:  postmaster received pmsignal signal
startup[279890] LOG:  redo is not required
checkpointer[279891] LOG:  checkpoint starting: end-of-recovery 
immediate wait
checkpointer[279891] LOG:  checkpoint complete: wrote 0 buffers (0.0%), 
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; 
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001 
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo 
lsn=0/175A4D8
startup[279890] DEBUG:  exit(0)
postmaster[279874] DEBUG:  updating PMState from PM_WAIT_BACKENDS to 
PM_WAIT_BACKENDS
checkpointer[279891] DEBUG:  checkpoint skipped because system is idle
checkpointer[279891] DEBUG:  checkpoint skipped because system is idle
I don't know how to fix this, but thought it's worth reporting.
Best regards,
-- 
Sergey Shinderuk		https://postgrespro.com/
| Attachment | Content-Type | Size | 
|---|---|---|
| delay-recovery-started.patch | text/x-patch | 504 bytes | 
| init.sh | application/x-shellscript | 135 bytes | 
| stop-after-crash.sh | application/x-shellscript | 172 bytes | 
| logfile.gz | application/gzip | 2.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2025-04-24 12:11:52 | pg_upgrade-breaking release | 
| Previous Message | Peter Eisentraut | 2025-04-24 12:00:25 | Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql) |