Re: # of connections and architecture design

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: # of connections and architecture design
Date: 2017-04-18 16:51:53
Message-ID: CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 18, 2017 at 2:42 AM, Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>
wrote:

> Hi all,
> As many of you has read last Friday (and many has tired to help, too,
> and I still thank you very much), I had a bad service outage.
> I was pointed to reduce number of maximum connections using a pooler, and
> that's what I'm building in test lab, but I'm wondering if there's
> something I can do with my overall architecture design.
> ATM we host one database per customer (about 400 now) and every customer
> has two points of access to data:
> - Directly to database, via rubyrep, to replicate the database he has in
> his own machine
> - Wia WCF self-hosted web services to read other customers data
> Every customer can access (and replicate) his database from a number of
> different positions (max 3).
> Customers are organized in groups (max 10 per group), and there is the
> chance that someone accesses someone else's data via WCF.
> For example, pick up a group of 5: everyone running rubyrep with only one
> position enabled, and getting data from others' database.
> If I'm not wrong, it's 5 connections (rubyrep) plus 5*4 (everyone
> connecting to everyone else's database) for WCF, so 25 connections
> Now imagine a group of 10....
> Last friday I've been told that 350 connections is quite a big number and
> things can begin to slow down. Ok. When something slows down I'm used to
> search and find the bottleneck (CPU, RAM, IO, etc). If everything was
> running apparently fine (CPU < 10%, RAM used < 20%, I/O rarely over 20%),
> how can I say there's a bottleneck that's slowing down things? Am I missing
> something?
> Another thing is that on a customer server (with a similar, smaller
> architecture) I _do_ have a connection leak problem that's under
> investigation, but when things begin to slow down I simply run a
> pg_terminate_backend on all connection with an age > 10 min and everything
> goes back to normal. On my server, last friday, it did not help, so I
> thought that was not the main problem.
> I've got no problems in splitting this architecture in how many servers I
> need, but I think I need some tips on how to design this, in order to avoid
> major issues in the near future (ask for details if needed).
>
> The current PostgreSQL 9.5.6 server is an 8 core VM with 52 GB RAM and
> Debian 8.
> WCF server is Windows 2012 R2 4-core, 16 GB RAM.
>
> While facing the issue none of them showed up any kind of overload and
> their logs were clean.
>
> I'm a bit scared it can happen again.........
>

The logs being clean doesn't help much, if your log settings are set to be
too terse.

Is log_lock_waits on? log_checkpoints? track_io_timing (doesn't show up
in the logs, you have to query database views)?

Is log_min_duration_statement set to a reasonable value?
log_autovacuum_min_duration?

Are you using pg_stat_statement (also doesn't show up in the logs, you have
to query it), and perhaps auto_explain?

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2017-04-18 19:42:21 Re: tuple statistics update
Previous Message DrakoRod 2017-04-18 14:55:20 With the password stored a another database, how to securely connecting to server