Re: # of connections and architecture design

From: Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: # of connections and architecture design
Date: 2017-04-19 08:25:20
Message-ID: e43b66cf-0910-81e5-4515-f0573cd8d1bb@evolu-s.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Il 18/04/2017 18:51, Jeff Janes ha
scritto:<br>
</div>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Apr 18, 2017 at 2:42 AM,
Moreno Andreo <span dir="ltr">&lt;<a moz-do-not-send="true"
href="mailto:moreno(dot)andreo(at)evolu-s(dot)it" target="_blank">moreno(dot)andreo(at)evolu-s(dot)it</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hi all,<br>
    As many of you has read last Friday (and many has
tired to help, too, and I still thank you very much), I
had a bad service outage.<br>
I was pointed to reduce number of maximum connections
using a pooler, and that's what I'm building in test lab,
but I'm wondering if there's something I can do with my
overall architecture design.<br>
ATM we host one database per customer (about 400 now) and
every customer has two points of access to data:<br>
- Directly to database, via rubyrep, to replicate the
database he has in his own machine<br>
- Wia WCF self-hosted web services to read other customers
data<br>
Every customer can access (and replicate) his database
from a number of different positions (max 3).<br>
Customers are organized in groups (max 10 per group), and
there is the chance that someone accesses someone else's
data via WCF.<br>
For example, pick up a group of 5: everyone running
rubyrep with only one position enabled, and getting data
from others' database.<br>
If I'm not wrong, it's 5 connections (rubyrep) plus 5*4
(everyone connecting to everyone else's database) for WCF,
so 25 connections<br>
Now imagine a group of 10....<br>
Last friday I've been told that 350 connections is quite a
big number and things can begin to slow down. Ok. When
something slows down I'm used to search and find the
bottleneck (CPU, RAM, IO, etc). If everything was running
apparently fine (CPU &lt; 10%, RAM used &lt; 20%, I/O
rarely over 20%), how can I say there's a bottleneck
that's slowing down things? Am I missing something?<br>
Another thing is that on a customer server (with a
similar, smaller architecture)  I _do_ have a connection
leak problem that's under investigation, but when things
begin to slow down I simply run a pg_terminate_backend on
all connection with an age &gt; 10 min and everything goes
back to normal. On my server, last friday, it did not
help, so I thought that was not the main problem.<br>
I've got no problems in splitting this architecture in how
many servers I need, but I think I need some tips on how
to design this, in order to avoid major issues in the near
future (ask for details if needed).<br>
<br>
The current PostgreSQL 9.5.6 server is an 8 core VM with
52 GB RAM and Debian 8.<br>
WCF server is Windows 2012 R2 4-core, 16 GB RAM.<br>
<br>
While facing the issue none of them showed up any kind of
overload and their logs were clean.<br>
<br>
I'm a bit scared it can happen again.........<br>
</blockquote>
<div><br>
</div>
<div>The logs being clean doesn't help much, if your log
settings are set to be too terse.</div>
<div><br>
</div>
<div>Is log_lock_waits on?</div>
</div>
</div>
</div>
</blockquote>
It's off<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>  log_checkpoints?</div>
</div>
</div>
</div>
</blockquote>
off<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>  track_io_timing (doesn't show up in the logs, you
have to query database views)?  <br>
</div>
</div>
</div>
</div>
</blockquote>
off (never referred to)<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Is log_min_duration_statement set to a reasonable
value?  <br>
</div>
</div>
</div>
</div>
</blockquote>
default value (it's commented out)<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>log_autovacuum_min_duration?</div>
</div>
</div>
</div>
</blockquote>
default value (commented)<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Are you using pg_stat_statement (also doesn't show up
in the logs, you have to query it), </div>
</div>
</div>
</div>
</blockquote>
I'm using pg_stat_statements to keep track of the connections, their
origin and if there's some lock (waiting = true)... obviously in a
given moment I have only a maximum of 2 or 3 with state = 'active',
all others are 'idle'<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>and perhaps auto_explain?</div>
</div>
</div>
</div>
</blockquote>
never used. (I'll check the docs).<br>
<br>
Thanks<br>
Moreno.<br>
<blockquote
cite="mid:CAMkU=1yWMM+7zTQAZWgJ1vPOLOp+UX97JG3kPV8AHUk96b3B-g(at)mail(dot)gmail(dot)com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Cheers,</div>
<div><br>
</div>
<div>Jeff</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 7.4 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2017-04-19 09:55:22 potential extension of psql's \df+ ?
Previous Message John R Pierce 2017-04-19 07:48:40 Re: Large data and slow queries