Re: Service outage: each postgres process use the exact amount of the configured work_mem [(AUTO-RE)SOLVED...]

From: Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>
To: Igor Neyman <ineyman(at)perceptron(dot)com>, Melvin Davidson <melvin6925(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Service outage: each postgres process use the exact amount of the configured work_mem [(AUTO-RE)SOLVED...]
Date: 2017-04-14 21:40:40
Message-ID: 186b6b6a-2a2d-3241-f70d-09e59712750a@evolu-s.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Il 14/04/2017 21:15, Igor Neyman ha
scritto:<br>
</div>
<blockquote
cite="mid:DM5PR07MB281034984AF523DC4E0095C6DA050(at)DM5PR07MB2810(dot)namprd07(dot)prod(dot)outlook(dot)com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
tt
{mso-style-priority:99;
font-family:"Courier New";}
span.gmail-m-6686708562138806043gmail-hoenzb
{mso-style-name:gmail-m_-6686708562138806043gmail-hoenzb;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext">From:</span></b><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext">
Moreno Andreo [<a class="moz-txt-link-freetext" href="mailto:moreno(dot)andreo(at)evolu-s(dot)it">mailto:moreno(dot)andreo(at)evolu-s(dot)it</a>]
<br>
<b>Sent:</b> Friday, April 14, 2017 3:11 PM<br>
<b>To:</b> Igor Neyman <a class="moz-txt-link-rfc2396E" href="mailto:ineyman(at)perceptron(dot)com">&lt;ineyman(at)perceptron(dot)com&gt;</a>;
Melvin Davidson <a class="moz-txt-link-rfc2396E" href="mailto:melvin6925(at)gmail(dot)com">&lt;melvin6925(at)gmail(dot)com&gt;</a><br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:pgsql-general(at)postgresql(dot)org">pgsql-general(at)postgresql(dot)org</a><br>
<b>Subject:</b> Re: [GENERAL] Service outage: each
postgres process use the exact amount of the configured
work_mem<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><i><span style="color:red">Attention: This email was sent
from someone outside of Perceptron. Always exercise
caution when opening attachments or clicking links from
unknown senders or when receiving unexpected emails.</span></i><span
style="color:red"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Il 14/04/2017 21:06, Igor Neyman ha
scritto:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext">From:</span></b><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:windowtext">
Moreno Andreo [<a moz-do-not-send="true"
href="mailto:moreno(dot)andreo(at)evolu-s(dot)it">mailto:moreno(dot)andreo(at)evolu-s(dot)it</a>]
<br>
<b>Sent:</b> Friday, April 14, 2017 3:00 PM<br>
<b>To:</b> Igor Neyman <a moz-do-not-send="true"
href="mailto:ineyman(at)perceptron(dot)com">&lt;ineyman(at)perceptron(dot)com&gt;</a>;
Melvin Davidson
<a moz-do-not-send="true"
href="mailto:melvin6925(at)gmail(dot)com">&lt;melvin6925(at)gmail(dot)com&gt;</a><br>
<b>Cc:</b> <a moz-do-not-send="true"
href="mailto:pgsql-general(at)postgresql(dot)org">pgsql-general(at)postgresql(dot)org</a><br>
<b>Subject:</b> Re: [GENERAL] Service outage: each
postgres process use the exact amount of the
configured work_mem</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p><i><span style="color:red">Attention: This email was sent
from someone outside of Perceptron. Always exercise
caution when opening attachments or clicking links
from unknown senders or when receiving unexpected
emails.</span></i><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal">Il 14/04/2017 20:40, Igor Neyman ha
scritto:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal">E. How many users were
connected when the problem occurred?<o:p></o:p></p>
</div>
</div>
</blockquote>
<p class="MsoNormal">About 350 connections<br>
<br>
Thanks<br>
<br>
<span style="color:#1F497D">______________________________________________________________________________________</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Probably
that is your problem, if you don’t have connection
pooler.</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">I’d
recommend to start with PgBouncer, very light-weight
and easy to configure.</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Regards,</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Igor
Neyman</span><o:p></o:p></p>
<p> <o:p></o:p></p>
</blockquote>
<p>I'm building such a system, but I don't think that's
the problem because now, with 30 active connections, it
still does not work (actually it's extremely slow).<o:p></o:p></p>
<div style="border:none;border-bottom:solid windowtext
1.5pt;padding:0in 0in 1.0pt 0in">
<p>Both servers are far from being under heavy load (so
they were before...)<o:p></o:p></p>
</div>
<p><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Having
total of 350 connections and only 30 of them being
active, is the exact case where connection pooler
helps.</span><o:p></o:p></p>
<p><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Igor
Neyman</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
<p class="MsoNormal">Sorry, I misexplained.<br>
4 hours ago we were in rush hour and the server was running
with its usual load (graphs do not show spikes in server
load), and we had about 350 connections.<br>
To troubleshoot the whole thing, I needed to restart the
servers, in the meantime customers disconnected.<br>
Now I have 30 of them (most of them alive), select * from
pg_stat_activity shows 30 connections.<br>
That's why I don't think it's a connection # problem.<br>
<span style="color:#1F497D">____________________________________________________________________________________________________________________<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Right.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">But you had
300 connections when the problem occurred, and leads me to
believe that pgBouncer is needed.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Igor Neyman</span><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"> </p>
</div>
</blockquote>
</div>
</div>
</blockquote>
For whom may be in interest.....<br>
After creating another VM for the Windows instance, using a snapshot
from past night (and choosing a good client to test... our
customer's has some other problems, I think........) everything
worked.<br>
No lags, no delays. <br>
Fast as usual. <br>
So we switched back to the original server. <br>
Everything worked. <br>
<br>
This is the WORST situation. <br>
Honestly, I didn't understand which server and module was the
problem, neither did figure out the cause. I found *nothing* in
logs.<br>
It can happen again.<br>
And I'll be in the same situation.<br>
<br>
I need to hurry up with the new structure (cluster primary/standby
with pgpool II for pooling and load balancing and barman for
backup), that is growing in the test lab.<br>
<br>
Being on Google Cloud Platform, this makes me suspect of some
"strange" problem (next Tuesday I'll ask them) on their
structure.... but on the other side I think they'd be alerting us on
some issues they're experiencing.<br>
<br>
We designed this application 6 years ago, in a hurry (long story
short: we needed to build a product to be *absolutely* sold before
fall 2012, and we started in July 2011) and went a bit superficial
with some modules of the program. One of them was database design. I
was using Postgresql since 2005, but I never used it in scenarios
that go over having 2 or 3 databases that collect logs and do some
reporting, so I was lacking experience in *battle* scenarios like
this, so I went with a design in which every customer has its own
role, which makes him authenticate, and with this role he can
connect to his database and to some of his colleagues' (imagine a
group of 10... if everyone connects to everyone, it's 100
connections!). Now we're updating this, so only one role is used for
everyone, and authentication is made in other ways, and I think that
in this situation pgbouncer will help a little, but with the new
design a pooler will lend us a great hand.<br>
<br>
Thanks for your patience<br>
Happy Easter<br>
Moreno.<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 12.8 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2017-04-15 17:49:30 Re: Error During PostGIS Build From Source on Linux
Previous Message Osahon Oduware 2017-04-14 19:20:44 Re: Error During PostGIS Build From Source on Linux