From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Justin Clift <justin(at)postgresql(dot)org> |
Cc: | Michael Devogelaere <michael(at)digibel(dot)be>, Jan Wieck <janwieck(at)yahoo(dot)com>, Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, PostgreSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL crashes with Qmail-SQL |
Date: | 2002-01-25 01:50:06 |
Message-ID: | 21819.1011923406@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I said:
> That still leaves us with all the defunct postmaster children to explain
> though. Hmm. I wonder exactly what the postmaster does when someone
> forcibly removes its socket file... probably system-dependent, but I
> could certainly believe getting into a busy-wait loop of select/accept.
> That doesn't look like it should prevent SIGCHLD from getting noticed,
> though.
On Linux (at least RH 7.2), the answer to what happens when the socket
file is removed is: nothing. Clients can't connect anymore, but the
postmaster gets no error indicating that anything is wrong. So it sits.
And that means that the 7.1-to-7.2 change I mentioned before is
relevant. In 7.1, the SIGCHLD signal handler blocked signals at its
beginning, and didn't think to unblock them on exit. So after servicing
one SIGCHLD interrupt, the postmaster would end up sitting at its
select() with signals blocked. Further SIGCHLDs would not get serviced
until the next spin around the outer loop re-enabled interrupts.
Normally, no big deal, but with no new connection requests coming in,
the postmaster wouldn't ever get around to wait()ing for its last few
children. (7.2 re-enables signals at exit from the handler, so I don't
think it will show this problem; and indeed I don't see any zombies
after "rm /tmp/.s.PGSQL.5432" during a run of Michael's benchmark
script with 7.2. Not incidentally, I do observe a complete lack of any
complaints out of the benchmark script; it keeps flailing along without
any sign that all its database connection attempts are failing.)
In short: all the reported facts can be explained by the theory that
*something* removed the socket file during that long test run.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Clift | 2002-01-25 01:52:18 | Re: PostgreSQL crashes with Qmail-SQL |
Previous Message | Peter Eisentraut | 2002-01-25 01:44:30 | Re: PostgreSQL crashes with Qmail-SQL |