From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Rémi Zara <remi_zara(at)mac(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, cm(at)enterprisedb(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unportable implementation of background worker start |
Date: | 2017-04-25 15:57:30 |
Message-ID: | 19167.1493135850@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
=?utf-8?Q?R=C3=A9mi_Zara?= <remi_zara(at)mac(dot)com> writes:
>> Le 25 avr. 2017 à 01:47, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> a écrit :
>> It looks like coypu is going to need manual intervention (ie, kill -9
>> on the leftover postmaster) to get unwedged :-(. That's particularly
>> disturbing because it implies that ServerLoop isn't iterating at all;
>> otherwise, it'd have noticed by now that the buildfarm script deleted
>> its data directory out from under it.
> coypu was not stuck (no buildfarm related process running), but failed to clean-up shared memory and semaphores.
> I’ve done the clean-up.
Huh, that's even more interesting.
Looking at the code, what ServerLoop actually does when it notices that
the postmaster.pid file has been removed is
kill(MyProcPid, SIGQUIT);
So if our hypothesis is that pselect() failed to unblock signals,
then failure to quit is easily explained: the postmaster never
received/acted on its own signal. But that should have left you
with a running postmaster holding the shared memory and semaphores.
Seems like if it is gone but it failed to remove those, somebody must've
kill -9'd it ... but who? I see nothing in the buildfarm script that
would.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2017-04-25 16:08:13 | Re: Quorum commit for multiple synchronous replication. |
Previous Message | Robert Haas | 2017-04-25 15:42:56 | Re: pgbench tap tests & minor fixes |