Re: Bi-modal streaming replication throughput

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexis Lê-Quôc <alq(at)datadoghq(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bi-modal streaming replication throughput
Date: 2018-08-14 17:46:45
Message-ID: 20180814174645.tkyxu27ophat3cqk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

On 2018-08-14 15:18:55 +0200, Alexis Lê-Quôc wrote:
> We run
> a cluster of
> large, SSD-backed, i3.16xl (64 cores visible to Linux, ~500GB of RAM, with
> 8GB of shared_buffers, fast NVMe drives) nodes
> , each
> running PG 9.3
> on linux
> in a vanilla streaming asynchronous replication setup: 1 primary node, 1
> replica designated for failover (left alone) and 6 read replicas, taking
> queries.

9.3 is extremely old, we've made numerous performance improvements in
areas potentially related to your problem.

> Under normal circumstances this is working exactly as planned but when I
> dial up the number of INSERTs on the primary to ~10k rows per second, or
> roughly 50MB of data per second (not enough to saturate the network between
> nodes)
> , read replicas falls hopelessly and consistently behind until read traffic
> is diverted away
> .

Do you use hot_standby_feedback=on?

> 1. We see read replicas fall behind and we can measure their replication
> throughput to be
> consistently
> 1-2% of what the primary is sustaining, by measuring the replication delay
> (in second) every second. We quickly get
> that metric
> to 0.98-0.99 (1 means that replication is completely stuck
> as it falls behind by one second every second
> ). CPU, memory
> , I/O
> (per core iowait)
> or network
> (throughput)
> as a whole resource are not
> visibly
> maxed out

Are individual *cores* maxed out however? IIUC you're measuring overall
CPU util, right? Recovery (streaming replication apply) is largely
single threaded.

> Here are some settings that may help and a perf profile of a recovery
> process that runs without any competing read traffic processing the INSERT
> backlog (I don't unfortunately have the same profile on a lagging read
> replica).

Unfortunately that's not going to help us much identifying the
contention...

> + 30.25% 26.78% postgres postgres [.] mdnblocks

This I've likely fixed ~two years back:

http://archives.postgresql.org/message-id/72a98a639574d2e25ed94652848555900c81a799

> + 18.64% 18.64% postgres postgres [.] 0x00000000000fde6a

Hm, too bad that this is without a symbol - 18% self is quite a
bit. What perf options are you using?

> + 4.74% 4.74% postgres [kernel.kallsyms] [k]
> copy_user_enhanced_fast_string

Possible that a slightly bigger shared buffer would help you.

It'd probably more helpful to look at a perf report --no-children for
this kind of analysis.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2018-08-14 17:50:02 Re: Bi-modal streaming replication throughput
Previous Message Jeff Janes 2018-08-14 14:51:25 Re: Bi-modal streaming replication throughput