Re: debugging server connection issue

From: Stephen Constable <sjconsta(at)scisat(dot)ca>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: debugging server connection issue
Date: 2016-03-31 12:37:15
Message-ID: CACfdzQmdT6=dUgUFehn12+sC9trCLFf=h6d76MTwgTKFYQU7QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In case anybody looks back on this thread in the future, I fixed the
problem (at least temporarily) by running the following in CentOS 7:
# echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_range
# sysctl -w net.ipv4.tcp_timestamps=1
# sysctl -w net.ipv4.tcp_tw_recycle=1
# sysctl -w tcp_tw_reuse=1

On both the database server and the client nodes. It seems the clients
were eating up all the available ephemeral ports opening and closing
database connections. I'm going to have a long talk with the developer
about why this is done in the first place :)

Cheers, and thanks for the tips.
--Steve

On Tue, Mar 29, 2016 at 6:51 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 03/29/2016 03:25 PM, Stephen Constable wrote:
> > Sorry, my client environment is Linux.
>
> Hmm, so I was reading win32.c wrong. It is mapping a Windows error
> message to that string.
>
> >
> > My current theory is that my clients are running out of available
> > ephemeral ports, like in this thread:
> >
> http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable
> (but
> > I"m not currently using pg bouncer). I tried pg bouncer before and had
> > the same errors, which in retrospect makes the client-side issue seem
> > more likely. Are there any configuration variables I can set to reduce
> > the number of ephemeral ports required in the postgresql client
> > libraries? Otherwise, I will attempt to reconfigure the OS of the
> > client machines tomorrow morning.
>
> Not sure how that would work. To make a network connection would seem to
> me to require a port.
>
> Are you seeing the same sort of port churn on your 8.4 machine?
>
> Is the fact that is processing results slower maybe giving the ports a
> chance to timeout their wait time, versus not on the newer faster machine?
>
> The issue, to me at least, seems to be less the number of jobs, but the
> number of connections each job is producing. T
> >
> > Thanks,
> > Steve
> >
> > On Tue, Mar 29, 2016 at 4:44 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 03/29/2016 01:28 PM, Stephen Constable wrote:
> > > My apologies, I'm not sure what part of the networking stack the
> > > messages are coming from. It also states:
> > > """
> > > could not connect to server: Cannot assign requested address
> > > Is the server running on host "<hostname>" and accepting
> > > TCP/IP connections on port <port>?
> > > """
> >
> > Alright I lied, the above is a Postgres error message. I am just not
> > used to seeing 'Cannot assign requested address'. Turns out it is in
> > interfaces/libpq/win32.c.
> >
> > So your client is running on Windows?
> >
> >
> > > This error is only printed under a 32-job load, never a single
> > job load.
> > >
> > > The processes are indeed connecting over a local network.
> > >
> > > I have only enabled the logging of connections and disconnections
> > since
> > > I figured that would be the most telling :) perhaps that was not
> the
> > > best idea. but, FYI, I see over 5000 such notices in a single
> > minute.
> > > I will reconfigure the logging to be more verbose.
> > >
> > > Thanks,
> > > Steve
> > >
> > > On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver
> > <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> wrote:
> > >
> > > On 03/29/2016 01:10 PM, Stephen Constable wrote:
> > > > Hi All,
> > > >
> > > > I'm a new-ish sysadmin working on porting legacy
> > scientific code
> > > from a
> > > > local server/client to new supercomputer environment. My
> > work is
> > > mostly
> > > > done, except that my postgres database doesn't seem to be
> > able to
> > > keep
> > > > up with the new environment. The application is written
> > in-house
> > > in a
> > > > mixture of FORTAN 77 and C, and uses postgres BLOBS as its
> > main data
> > > > store. This application in particular only reads from the
> > > database, it
> > > > never writes, which *should* make it easy to scale.
> > > >
> > > > My main problem is that this client application is unable
> to
> > > connect to
> > > > the database under a modest load (32 simultaneous jobs).
> > The client
> > > > error logs print out messages like "could not connect to
> > server:
> > > Cannot
> > > > assign requested address" and "Cannot connect to database
> > > [runlog]!!!"
> > > > (an important database of ours). The "cannot assign
> requested
> > > address"
> > >
> > > Well those do not look like Postgres error messages to me, so
> > the first
> > > thing would be to determine what part of the stack is
> > generating them.
> > >
> > > Is the client software connecting to the database over a
> network?
> > >
> > > Are you using connection pooling?
> > >
> > > > message makes me think it's a configuration issue. The
> > logs are
> > > flooded
> > > > with hundreds of connection and disconnection notices per
> > > second. This
> > >
> > > Might want to turn off logging connections/disconnections:
> > >
> > >
> >
> http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
> > >
> > > log_connections (boolean)
> > >
> > > log_disconnections (boolean)
> > >
> > > > same code and configuration runs fine on our mid-2000's
> > Solaris
> > > 10 box
> > > > with postgres 8.4 (albeit very slowly) but totally fails
> > with these
> > > > connection errors on a modern Dell system running CentOS 7
> or
> > > FreeBSD 10
> > > > (I tested both) with postgres 9.4.
> > > >
> > > > While the database is under load (and jobs are actively
> > failing),
> > > select
> > > > count(*) from pg_stat_activity returns 30-34 ish
> > connections, show
> > > > max_connections returns 100, and show
> > superuser_reserved_connections
> > > > shows 3. My only other hint is that right after a fresh
> > install of
> > > > CentOS 7 my job success rate was around 50%, and now it has
> > > approached
> > > > approximately 5%, so something is changing over time.
> > > >
> > > > Does anyone have any advice or experience with similar
> issues?
> > >
> > > What else does the Postgres log show besides the
> > > connections/disconnections, that might be of interest?
> > >
> > > What does the system log show?
> > >
> > > >
> > > > Thanks,
> > > > Steve
> > > >
> > >
> > >
> > > --
> > > Adrian Klaver
> > > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com
> >>
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alex Ignatov 2016-03-31 12:49:28 Re: Re: Postgres 9.4.4/9.4.6: plpython2/3 intallation issues on a Windows 7 machine
Previous Message Nik Mitev 2016-03-31 12:35:41 Is it possible to delete a single value from an enum type?