From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Escaping from blocked send() reprised. |
Date: | 2015-01-12 00:37:53 |
Message-ID: | 20150112003753.GB2722746@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 12, 2015 at 12:40:50AM +0100, Andres Freund wrote:
> On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> > On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> > > Imo pretty close to commit and can be committed independently.
> >
> > The key open question is whether all platforms of interest can reliably detect
> > end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
> > select() cannot; see attached test program.
>
> Yuck. By my reading that's a violation of posix.
Agreed.
> I did test it a bit, and I didn't see problems, but that obviously
> doesn't say much about old versions.
>
> Afaics we interestingly don't have any poll-less buildfarm animals that
> use unix_latch.c...
More likely is that some system will have a poll() exhibiting the same bug,
possibly via poll() being a wrapper around select(). Systems without poll()
are hard to find; the gnulib manual lists only Windows, BeOS and HP NonStop
OS. HP NonStop OS is the one possibly of interest here. I have never
personally seen a machine running it.
I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario. Then we can
decide what more to do if failure evidence emerges.
> > We use poll() there anyway, so the bug in that configuration does not
> > affect PostgreSQL. Is it a bellwether of similar bugs in other
> > implementations, bugs that will affect PostgreSQL?
>
> Hm. I can think of two stopgap measures we could add:
>
> 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
> _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
> time spent waiting only for writable normally shouldn't be very long,
> that shouldn't be noticeably bad for power usage.
> 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).
I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.
> > > + if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > > + {
> > > + /* EOF/error condition */
> > > + if (wakeEvents & WL_SOCKET_READABLE)
> > > + result |= WL_SOCKET_READABLE;
> > > + if (wakeEvents & WL_SOCKET_WRITEABLE)
> > > + result |= WL_SOCKET_WRITEABLE;
> > > + }
> >
> > With some poll() implementations (e.g. OS X), this can wrongly report
> > WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
> > that's acceptable. libpq does not use shutdown(), and other client interfaces
> > would do so at their own risk. Should we worry about hostile clients creating
> > a denial-of-service by causing a server send() to block unexpectedly?
> > Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> > can already achieve that.
>
> Yea, this doesn't seem particularly concerning.
>
> a) They can just stop consuming writes and use noticeable amounts of
> memory by doing output intensive queries. That uses significant os
> resources and is much harder to detect - today.
If there's anything to worry about here (unlikely), it would be with respect
to not-yet-authenticated connections only.
> b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
> here? We already allow _WRITABLE...
Today's code translates POLLHUP to WL_SOCKET_READABLE; it must see POLLOUT to
set WL_SOCKET_WRITEABLE. Your patch changes that.
> What happens if you write/send() in
> that state, btw?
write() reports EAGAIN.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-01-12 00:45:41 | Re: Escaping from blocked send() reprised. |
Previous Message | Andres Freund | 2015-01-12 00:37:28 | Re: Using 128-bit integers for sum, avg and statistics aggregates |