Re: Allow logical failover slots to wait on synchronous replication

From: John H <johnhyvr(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow logical failover slots to wait on synchronous replication
Date: 2024-07-08 19:08:58
Message-ID: CA+-JvFvvVn7ab0fWCh+Bqj0khSYsnbvxiZsktj8k+OOB-V_uZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks Bertrand for taking a look at the patch.

On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:

>
> + int mode = SyncRepWaitMode;
>
> It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?
>

I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:

if (mode == SYNC_REP_NO_WAIT)
return true;

However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].

> 2 ====
>
> + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
>
> I did some testing and saw that the lsn[] values were not always set to
> InvalidXLogRecPtr right after. It looks like that, in that case, we should
> avoid setting the lsn[] values at compile time. Then, what about?
>
> 1. remove the "static".
>
> or
>
> 2. keep the static but set the lsn[] values after its declaration.

I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.

> 3 ====
>
> - /*
> - * Return false if not all the standbys have caught up to the specified
> - * WAL location.
> - */
> - if (caught_up_slot_num != standby_slot_names_config->nslotnames)
> - return false;
> + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
> + return true;
>
> lsn[] values are/(should have been, see 2 above) just been initialized to
> InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
> true. I think this check is not needed then.

Removed.

> Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.

--
John Hsu - Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John H 2024-07-08 19:12:13 Re: Allow logical failover slots to wait on synchronous replication
Previous Message Melanie Plageman 2024-07-08 18:25:16 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin