Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dave Page <dave(dot)page(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, CM Team <cm(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, bernd(dot)helmle(at)credativ(dot)de
Subject: Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
Date: 2014-09-29 19:37:33
Message-ID: 20140929193733.GB14400@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-09-29 15:24:55 -0400, Robert Haas wrote:
> On Mon, Sep 29, 2014 at 2:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > If that theory is true, wouldn't things get unstuck everytime a new
> > connection comes in? Or 60 seconds have passed? That's not to say this
> > isn't wrong, but still?
>
> There aren't any going to be any new connections arriving when running
> the contrib regression tests, I believe, so I don't think there is an
> escape hatch there.

I thought you might have tested to connect... And I'd guessed you'd have
reported if that had fixed it.

> I didn't think to check how timeout was set in
> ServerLoop, and it does look like the maximum ought to be 60 seconds,
> so either there's some other ingredient I'm missing here, or the whole
> theory is just wrong altogether. :-(

Yea :(. Note how signals are blocked in all the signal handlers and only
unblocked for a very short time (the sleep).

(stare at random shit for far too long)

Ah. DetermineSleepTime(), which is called while signals are unblocked!,
modifies BackgroundWorkerList. Previously that only iterated the list,
without modifying it. That's already of quite debatable safety, but
modifying it without having blocked signals is most definitely
broken. The modification was introduced by 7f7485a0c...

If you can manually run stuff on that machine, it'd be rather helpful if
you could put a PG_SETMASK(&BlockSig);...PG_SETMASK(&UnBlockSig); in the
HaveCrashedWorker() loop.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-09-29 19:39:22 Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
Previous Message Pavel Stehule 2014-09-29 19:28:06 Re: json (b) and null fields