Re: errors with high connections rate

From: "Pawel S(dot) Veselov" <pawel(dot)veselov(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: errors with high connections rate
Date: 2012-07-03 08:26:41
Message-ID: 4FF2ACC1.9060804@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/03/2012 12:34 AM, Craig Ringer wrote:
> On 07/03/2012 03:19 PM, Pawel Veselov wrote:
>> Hi.
>>
>> -- problem 1 --
>>
>> I have an application, using libpq, connecting to postgres 9.1.3
>> (Amazon AMI distro).
>> The application writes data at a high rate (at this point it's 500
>> transaction per second), using multiple threads (at this point it's 800).
>>
>> These are "worker" threads, that receive "messages" that are then
>> written out to the DB. There is no connection pool, instead, each
>> worker thread maintains it's own connection that it uses to write
>> data to the database. The connections are kept pthread's "specific"
>> data blocks.
>
[skipped, replied to separately]
>
>>
>> Can't connect to DB: could not send data to server: Transport
>> endpoint is not connected
>> could not send startup packet: Transport endpoint is not connected
>
> postmaster forking and failing because of operating system resource
> limits like max proc count, anti-forkbomb measures, max file handles, etc?

If accept() succeeded, and fork() failed, the socket would be closed by
the process (parent will close, child socket wouldn't even be forked),
wouldn't that result into ECONNRESET, and not ENOTCONN?

>
>> -- problem 2 --
>>
>> As I'm trying to debug this (with strace), I could never reproduce
>> it, at least to see what's going on, but sometimes I get another
>> error : "too many users connected". Even restarting postmaster
>> doesn't help. The postmaster is running with -N810, and the role has
>> connection limit of 1000. Yet, the "too many" error starts creeping
>> up only after 275 connections are opened (counted by successful
>> connect() from strace).
>>
>> Any idea where should I dig?
> See how many connections the *server* thinks exist by examining
> pg_stat_activity .
>
> Check dmesg and the PostgreSQL server logs to see if you're hitting
> operating system limits. Look for fork() failures, unexplained
> segfaults, etc.

That's the thing, no segfaults (dmesg), nothing in the server logs.

It may as well be some sort of an anti-fork-bomb measure, only judging
by the fact that with enough attempts, things do clear out, though I
wish there would be some indication of that, and I'm still confused
about the error code being ENOTCONN.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message adasko98 2012-07-03 09:08:15 Re: Notiffy problem
Previous Message Pawel S. Veselov 2012-07-03 08:16:36 Re: errors with high connections rate