From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgsql: TAP tests: check for postmaster.pid anyway when "pg_ctl start" f |
Date: | 2022-02-10 22:38:06 |
Message-ID: | 20220210223806.ikpf57pz2ilamqam@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Hi,
On 2022-02-10 14:58:57 -0500, Tom Lane wrote:
> So it looks to me like the core problem is that pg_ctl's do_stop()
> is too trusting: if it once sees the postmaster PID as alive, it
> figures that's the end of the story.
Agreed, that's a problem. Even if it wasn't the cause of slot tests on AIX.
kill -9 can be slow on other operating systems as well. I think on linux for
example it's not processed immediately if already in the middle of dumping
core.
> (do_restart and do_promote seem similarly naive ... and why are there so
> many copies of the wait loop, anyway?)
:(
There's generally some confusing duplication in pg_ctl.c itself, and between
pg_ctl and other programs. E.g. CreateRestrictedProcess() existing both in
pg_ctl and slightly differently in restricted_token.c. Wut? Also why is
restricted_token.c in common/, rather than port?
> Another idea is to modify Cluster.pm's kill9 to not return until the
> PID is gone according to kill(0). On the other hand, that'd mask
> problems like this, so I'm not entirely enthused about changing
> that end of things.
Seems reasonable.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-02-10 22:52:54 | Re: pgsql: TAP tests: check for postmaster.pid anyway when "pg_ctl start" f |
Previous Message | Tom Lane | 2022-02-10 21:49:58 | pgsql: Make pg_ctl stop/restart/promote recheck postmaster aliveness. |