From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Igor <igorya(dot)inscriptio(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument" |
Date: | 2012-11-18 19:13:49 |
Message-ID: | 20121118191349.GB1325@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 2012-11-18 14:07:37 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2012-11-18 13:18:42 -0500, Tom Lane wrote:
> >> Well, we have two reports of people trying such values (assuming that
> >> #7545 actually is the same thing), and it didn't work for either of
> >> them. I don't think it's a problem to restrict the value to something
> >> that will work rather than fail.
>
> > Thats a good point. But #7545 was on OSX so its not really that much
> > evidence a larger value doesn't work on non-bsdish systems.
>
> > In fact, a setting of 32 days seems not to cause any immediate problems
> > here on linux, even when choosing the timeout in a way it results in a
> > negative timeout value for poll. Not sure what its waiting for, but it
> > doesn't crash.
>
> If you look closely at what's happening in syslogger.c, you'll realize
> that it's a phase of the moon kind of behavior, because the
> next_rotation_time is rounded off to a multiple of the specified
> log_rotation_age. Then you'll get an overflow (or not) depending on
> whether now() is within 2^31 msec of that target time. So for example
> with a 30-day log_rotation_age, you'd be seeing failures during the
> first five days of every "month", where a month is defined as
> exactly-30-day intervals since the epoch. And I think there's a
> timezone correction in there too, which would shift the trouble
> intervals around for different people.
I verified with strace that I calculated correctly and got a negative
value:
poll([{fd=10, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1033055888) = 1 ([{fd=10, revents=POLLIN}])
(that poll finished was due to a SIGHUP)
> Anyway, even if Linux for some reason doesn't reject negative values,
> I think we need to limit the GUC's range so we don't try to use them.
> Maybe you'd get sane behavior and maybe you wouldn't. I notice that
> the Single Unix Spec doesn't specify EINVAL for negative timeout
> values, but it also doesn't define what happens for a negative value
> other than -1. So this is basically an unspecified case and it's
> incumbent on us to not do it if we want portable behavior.
I aggree that we need to do something. I just think it might be enough
to clamp the timeout value passed to WaitLatchOrSocket to the maximum
valid value. The rest of SysLoggerMain seems to work correctly in that
case.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-11-18 19:15:16 | Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument" |
Previous Message | Tom Lane | 2012-11-18 19:07:37 | Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument" |