Re: Intermittent "make check" failures on hyena

From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Intermittent "make check" failures on hyena
Date: 2006-08-07 12:07:04
Message-ID: 44D72CE8.6020005@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>
>> I see one occurrence in the 8.1 branch on hyena, but the failure
>> probability seems to have jumped way up in HEAD since we put in the
>> C-coded pg_regress. This lends weight to the idea that it's a
>> timing-related issue, because pg_regress.c is presumably much faster
>> at forking off a parallel gang of psqls than the shell script was;
>> and it's hard to see what else about the pg_regress change could be
>> affecting the psqls' ability to connect once forked.
>>
>> We probably need to get some Solaris experts involved in diagnosing
>> what's happening. Judging by the buildfarm results you should be able
>> to replicate it fairly easily by doing "make installcheck-parallel"
>> repeatedly.
>>
>
> I will refer this to those experts - my Solaris-fu is a tad rusty these
> days.

How Tom mentioned, problem is in the size of TCP connection queue
(parameter tcp_conn_req_max_q). Default is 128 in solaris 10. Second
limit is twice number of backends. See ./backend/libpq/pqcomm.c

/*
* Select appropriate accept-queue length limit.
PG_SOMAXCONN is only
* intended to provide a clamp on the request on
platforms where an
* overly large request provokes a kernel error (are
there any?).
*/
maxconn = MaxBackends * 2;
if (maxconn > PG_SOMAXCONN)
maxconn = PG_SOMAXCONN;

err = listen(fd, maxconn);

However what happened? I think that following scenarios occurred.
Postmaster listen only in one process and there are many clients run
really parallel. T2000 server has 32 threads ( 8 core and each has 4
threads). These clients generate more TCP/IP request at one time, than
postmaster is able accepted.

Zdenek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-08-07 12:25:23 Re: proposal for 8.3: Simultaneous assignment for PL/pgSQL
Previous Message Mario Weilguni 2006-08-07 10:29:07 Another Ltree/GiST problem