Re: Streaming Replication Networking Best Practices?

From: Don Seiler <don(at)seiler(dot)us>
To: Johannes Truschnigg <johannes(at)truschnigg(dot)info>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Streaming Replication Networking Best Practices?
Date: 2018-05-14 19:15:29
Message-ID: CAHJZqBD4SPfw8xGvhXd092W2gJL3sG1aK+jyrwe-CPtd0JCj6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, May 14, 2018 at 1:31 PM, Johannes Truschnigg <
johannes(at)truschnigg(dot)info> wrote:

>
> Do you happen to have historical host-monitoring data available for when
> the
> replication interruption happened? You should definitely check for CPU (on
> both sides) and I/O (on the receiver/secondary) saturation.
>

We do have grafana and zenoss info going way back, I'll see if I can get a
login there.

> I remember when we first set up streaming replication initially, back then
> under postgres 9.0, the replication connection defaulted to using TLS/SSL;
> at
> the time with SSL/TLS compression enabled. The huge extra work that this
> incurred on the CPUs involved regularly made the WAL sender on the primary
> break streaming replication because it couldn't possibly keep up with the
> data
> that was being pushed into it encrypted & compressed TCP connection over a
> 10G
> link. (Linux's excellent perf tool proved invaluable in determining the
> exact
> cause for the high CPU load inside the postgres processes; once we had
> re-compiled OpenSSL without compression, the problem went away.)
>
> Now of course modern TLS library versions don't implement compression any
> more, and the streaming ciphers are most probably hardware accelerated for
> your combination of hard- and software, but the lesson we learned back then
> may still be worth keeping in mind...
>

Very interesting read. I just re-examined all of our settings in
postgresql.conf, pg_hba.con and recovery.conf and we don't have SSL enabled
anywhere there. I'm going to assume that this isn't a bottleneck in our
case then.

> Other than that... have you verified that the network link between your
> hosts
> can actually live up to you and your manager's expectations in terms of
> bandwidth delivered? iperf3 could help verify that; if the measured
> bandwidth
> for a single TCP stream lives up to what you'd expect, you can probably
> rule
> out network-related concerns and concentrate on looking at other potential
> bottlenecks.
>

Thanks, I'll play around with some of these tools.

Don.

--
Don Seiler
www.seiler.us

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ron 2018-05-14 19:45:54 Replication using VMware SRM
Previous Message arvind chikne 2018-05-14 18:51:05 Re: Master slave replication