From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Strange failure on mamba |
Date: | 2022-11-17 22:08:09 |
Message-ID: | 2051761.1668722889@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> I wonder why the walreceiver didn't start in
> 008_min_recovery_point_node_3.log here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-11-16%2023%3A13%3A38
mamba has been showing intermittent failures in various replication
tests since day one. My guess is that it's slow enough to be
particularly subject to the signal-handler race conditions that we
know exist in walreceivers and elsewhere. (Now, it wasn't any faster
in its previous incarnation as a macOS critter. But maybe modern
NetBSD has different scheduler behavior than ancient macOS and that
contributes somehow. Or maybe there's some other NetBSD weirdness
in here.)
I've tried to reproduce manually, without much success :-(
Like many of its other failures, there's a suggestive postmaster
log entry at the very end:
2022-11-16 19:45:53.851 EST [2036:4] LOG: received immediate shutdown request
2022-11-16 19:45:58.873 EST [2036:5] LOG: issuing SIGKILL to recalcitrant children
2022-11-16 19:45:58.881 EST [2036:6] LOG: database system is shut down
So some postmaster child is stuck somewhere where it's not responding
to SIGQUIT. While it's not unreasonable to guess that that's a
walreceiver, there's no hard evidence of it here. I've been wondering
if it'd be worth patching the postmaster so that it's a bit more verbose
about which children it had to SIGKILL. I've also wondered about
changing the SIGKILL to SIGABRT in hopes of reaping a core file that
could be investigated.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2022-11-17 22:11:00 | Re: Fix proposal for comparaison bugs in PostgreSQL::Version |
Previous Message | Cary Huang | 2022-11-17 22:01:19 | Patch: Global Unique Index |