| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | DELETE PENDING strikes back, via pg_ctl stop/start |
| Date: | 2024-08-21 10:00:00 |
| Message-ID: | 8eda5ecc-24c5-95ce-d719-1585e2d693b2@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
As a recent failure, produced by drongo [1], shows, pg_ctl stop/start
sequence may break on Windows due to the transient DELETE PENDING state of
posmaster.pid.
Please look at the excerpt from the failure log:
...
pg_createsubscriber: stopping the subscriber
2024-08-19 18:02:47.608 UTC [6988:4] LOG: received fast shutdown request
2024-08-19 18:02:47.608 UTC [6988:5] LOG: aborting any active transactions
2024-08-19 18:02:47.612 UTC [5884:2] FATAL: terminating walreceiver process due to administrator command
2024-08-19 18:02:47.705 UTC [7036:1] LOG: shutting down
pg_createsubscriber: server was stopped
### the server instance (1) emitted only "shutting down" yet, but pg_ctl
### considered it stopped and returned 0 to pg_createsubscriber
[18:02:47.900](2.828s) ok 29 - run pg_createsubscriber without --databases
...
pg_createsubscriber: starting the standby with command-line options
pg_createsubscriber: pg_ctl command is: ...
2024-08-19 18:02:48.163 UTC [5284:1] FATAL: could not create lock file "postmaster.pid": File exists
pg_createsubscriber: server was started
pg_createsubscriber: checking settings on subscriber
### pg_createsubscriber attempts to start new server instance (2), but
### it fails due to "postmaster.pid" still found on disk
2024-08-19 18:02:48.484 UTC [6988:6] LOG: database system is shut down
### the server instance (1) is finally stopped and postmaster.pid unlinked
With extra debug logging and the ntries limit decreased to 10 (in
CreateLockFile()), I reproduced the failure easily (when running 20 tests
in parallel) and got additional information (see attached).
IIUC, the issue is caused by inconsistent checks for postmaster.pid
existence:
"pg_ctl stop" ... -> get_pgpid() calls fopen(pid_file, "r"),
which fails with ENOENT for the DELETE_PENDING state (see
pgwin32_open_handle()).
"pg_ctl start" ... -> CreateLockFile() calls
fd = open(filename, O_RDWR | O_CREAT | O_EXCL, pg_file_create_mode);
which fails with EEXISTS for the same state of postmaster.pid.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-19%2017%3A32%3A54
Best regards,
Alexander
| Attachment | Content-Type | Size |
|---|---|---|
| pg_ctl-debugging.patch | text/x-patch | 3.1 KB |
| regress_log_040_pg_createsubscriber.tar.bz2 | application/x-bzip | 7.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Melih Mutlu | 2024-08-21 10:04:16 | Re: ANALYZE ONLY |
| Previous Message | Amit Kapila | 2024-08-21 09:57:46 | Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber |