From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it> |
Cc: | PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: # of connections and architecture design |
Date: | 2017-04-18 16:51:53 |
Message-ID: | CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Apr 18, 2017 at 2:42 AM, Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>
wrote:
> Hi all,
> As many of you has read last Friday (and many has tired to help, too,
> and I still thank you very much), I had a bad service outage.
> I was pointed to reduce number of maximum connections using a pooler, and
> that's what I'm building in test lab, but I'm wondering if there's
> something I can do with my overall architecture design.
> ATM we host one database per customer (about 400 now) and every customer
> has two points of access to data:
> - Directly to database, via rubyrep, to replicate the database he has in
> his own machine
> - Wia WCF self-hosted web services to read other customers data
> Every customer can access (and replicate) his database from a number of
> different positions (max 3).
> Customers are organized in groups (max 10 per group), and there is the
> chance that someone accesses someone else's data via WCF.
> For example, pick up a group of 5: everyone running rubyrep with only one
> position enabled, and getting data from others' database.
> If I'm not wrong, it's 5 connections (rubyrep) plus 5*4 (everyone
> connecting to everyone else's database) for WCF, so 25 connections
> Now imagine a group of 10....
> Last friday I've been told that 350 connections is quite a big number and
> things can begin to slow down. Ok. When something slows down I'm used to
> search and find the bottleneck (CPU, RAM, IO, etc). If everything was
> running apparently fine (CPU < 10%, RAM used < 20%, I/O rarely over 20%),
> how can I say there's a bottleneck that's slowing down things? Am I missing
> something?
> Another thing is that on a customer server (with a similar, smaller
> architecture) I _do_ have a connection leak problem that's under
> investigation, but when things begin to slow down I simply run a
> pg_terminate_backend on all connection with an age > 10 min and everything
> goes back to normal. On my server, last friday, it did not help, so I
> thought that was not the main problem.
> I've got no problems in splitting this architecture in how many servers I
> need, but I think I need some tips on how to design this, in order to avoid
> major issues in the near future (ask for details if needed).
>
> The current PostgreSQL 9.5.6 server is an 8 core VM with 52 GB RAM and
> Debian 8.
> WCF server is Windows 2012 R2 4-core, 16 GB RAM.
>
> While facing the issue none of them showed up any kind of overload and
> their logs were clean.
>
> I'm a bit scared it can happen again.........
>
The logs being clean doesn't help much, if your log settings are set to be
too terse.
Is log_lock_waits on? log_checkpoints? track_io_timing (doesn't show up
in the logs, you have to query database views)?
Is log_min_duration_statement set to a reasonable value?
log_autovacuum_min_duration?
Are you using pg_stat_statement (also doesn't show up in the logs, you have
to query it), and perhaps auto_explain?
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2017-04-18 19:42:21 | Re: tuple statistics update |
Previous Message | DrakoRod | 2017-04-18 14:55:20 | With the password stored a another database, how to securely connecting to server |