From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Florian Pflug <fgp(at)phlo(dot)org> |
Cc: | Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Latch implementation that wakes on postmaster death on both win32 and Unix |
Date: | 2011-07-05 06:49:03 |
Message-ID: | 4E12B3DF.4000705@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 05.07.2011 00:42, Florian Pflug wrote:
> On Jul4, 2011, at 23:11 , Peter Geoghegan wrote:
>> On 4 July 2011 17:36, Florian Pflug<fgp(at)phlo(dot)org> wrote:
>>> Btw, with the death-watch / life-sign / whatever infrastructure in place,
>>> shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?
>>
>> Hmm, maybe. That seems like a separate issue though, that can be
>> addressed with another patch. It does have the considerable
>> disadvantage of making Heikki's proposed assertion failure useless. Is
>> the implementation of PostmasterIsAlive() really a problem at the
>> moment?
>
> I'm not sure that there is currently a guarantee that PostmasterIsAlive
> will returns false immediately after select() indicates postmaster
> death. If e.g. the postmaster's parent is still running (which happens
> for example if you launch postgres via daemontools), the re-parenting of
> backends to init might not happen until the postmaster zombie has been
> vanquished by its parent's call of waitpid(). It's not entirely
> inconceivable for getppid() to then return the (dead) postmaster's pid
> until that waitpid() call has occurred.
Good point, and testing shows that that is exactly what happens at least
on Linux (see attached test program). So, as the code stands, the
children will go into a busy loop until the grandparent calls waitpid().
That's not good.
In that light, I agree we should replace kill() in PostmasterIsAlive()
with read() on the pipe. It would react faster than the kill()-based
test, which seems like a good thing. Or perhaps do both, and return
false if either test says the postmaster is dead.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
forktest.c | text/x-csrc | 714 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2011-07-05 08:10:47 | Re: %ENV warnings during builds |
Previous Message | Brar Piening | 2011-07-05 06:30:06 | Re: %ENV warnings during builds |