From: | Don Seiler <don(at)seiler(dot)us> |
---|---|
To: | Johannes Truschnigg <johannes(at)truschnigg(dot)info> |
Cc: | pgsql-admin <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Streaming Replication Networking Best Practices? |
Date: | 2018-05-14 19:15:29 |
Message-ID: | CAHJZqBD4SPfw8xGvhXd092W2gJL3sG1aK+jyrwe-CPtd0JCj6Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Mon, May 14, 2018 at 1:31 PM, Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:
>
> Do you happen to have historical host-monitoring data available for when
> the
> replication interruption happened? You should definitely check for CPU (on
> both sides) and I/O (on the receiver/secondary) saturation.
>
We do have grafana and zenoss info going way back, I'll see if I can get a
login there.
> I remember when we first set up streaming replication initially, back then
> under postgres 9.0, the replication connection defaulted to using TLS/SSL;
> at
> the time with SSL/TLS compression enabled. The huge extra work that this
> incurred on the CPUs involved regularly made the WAL sender on the primary
> break streaming replication because it couldn't possibly keep up with the
> data
> that was being pushed into it encrypted & compressed TCP connection over a
> 10G
> link. (Linux's excellent perf tool proved invaluable in determining the
> exact
> cause for the high CPU load inside the postgres processes; once we had
> re-compiled OpenSSL without compression, the problem went away.)
>
> Now of course modern TLS library versions don't implement compression any
> more, and the streaming ciphers are most probably hardware accelerated for
> your combination of hard- and software, but the lesson we learned back then
> may still be worth keeping in mind...
>
Very interesting read. I just re-examined all of our settings in
postgresql.conf, pg_hba.con and recovery.conf and we don't have SSL enabled
anywhere there. I'm going to assume that this isn't a bottleneck in our
case then.
> Other than that... have you verified that the network link between your
> hosts
> can actually live up to you and your manager's expectations in terms of
> bandwidth delivered? iperf3 could help verify that; if the measured
> bandwidth
> for a single TCP stream lives up to what you'd expect, you can probably
> rule
> out network-related concerns and concentrate on looking at other potential
> bottlenecks.
>
Thanks, I'll play around with some of these tools.
Don.
--
Don Seiler
www.seiler.us
From | Date | Subject | |
---|---|---|---|
Next Message | Ron | 2018-05-14 19:45:54 | Replication using VMware SRM |
Previous Message | arvind chikne | 2018-05-14 18:51:05 | Re: Master slave replication |