From: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> |
---|---|
To: | Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: errors with high connections rate |
Date: | 2012-07-03 07:34:44 |
Message-ID: | 4FF2A094.2050004@ringerc.id.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 07/03/2012 03:19 PM, Pawel Veselov wrote:
> Hi.
>
> -- problem 1 --
>
> I have an application, using libpq, connecting to postgres 9.1.3
> (Amazon AMI distro).
> The application writes data at a high rate (at this point it's 500
> transaction per second), using multiple threads (at this point it's 800).
>
> These are "worker" threads, that receive "messages" that are then
> written out to the DB. There is no connection pool, instead, each
> worker thread maintains it's own connection that it uses to write data
> to the database. The connections are kept pthread's "specific" data
> blocks.
Hmm. To get that kind of TPS with that design are you running with
fsync=off or on storage that claims to flush I/O without actually doing
so? Have you checked your crash safety? Is it just fairly big hardware?
Why are you using so many connections? Unless you have truly monstrous
hardware your system should achieve considerably greater throughput by
reducing the connection count and queueing bursts of writes. You
wouldn't even need an external pool in your case, just switch to a
producer/consumer model where your accepting threads add work to
separate and much fewer writer threads for sending to the DB. Writer
threads could then do useful optimisations like multi-value-inserting or
COPYing data, doing small batches in transactions, etc.
I'm seriously impressed that your system is working under load at all
with 800 concurrent connections fighting to write all at once.
>
> Can't connect to DB: could not send data to server: Transport endpoint
> is not connected
> could not send startup packet: Transport endpoint is not connected
postmaster forking and failing because of operating system resource
limits like max proc count, anti-forkbomb measures, max file handles, etc?
> -- problem 2 --
>
> As I'm trying to debug this (with strace), I could never reproduce it,
> at least to see what's going on, but sometimes I get another error :
> "too many users connected". Even restarting postmaster doesn't help.
> The postmaster is running with -N810, and the role has connection
> limit of 1000. Yet, the "too many" error starts creeping up only after
> 275 connections are opened (counted by successful connect() from strace).
>
> Any idea where should I dig?
See how many connections the *server* thinks exist by examining
pg_stat_activity .
Check dmesg and the PostgreSQL server logs to see if you're hitting
operating system limits. Look for fork() failures, unexplained
segfaults, etc.
--
Craig Ringer
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2012-07-03 07:54:53 | Re: errors with high connections rate |
Previous Message | Pawel Veselov | 2012-07-03 07:19:24 | errors with high connections rate |