From: | John H <johnhyvr(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Allow logical failover slots to wait on synchronous replication |
Date: | 2024-07-08 19:08:58 |
Message-ID: | CA+-JvFvvVn7ab0fWCh+Bqj0khSYsnbvxiZsktj8k+OOB-V_uZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Thanks Bertrand for taking a look at the patch.
On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> + int mode = SyncRepWaitMode;
>
> It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?
>
I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:
if (mode == SYNC_REP_NO_WAIT)
return true;
However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].
> 2 ====
>
> + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
>
> I did some testing and saw that the lsn[] values were not always set to
> InvalidXLogRecPtr right after. It looks like that, in that case, we should
> avoid setting the lsn[] values at compile time. Then, what about?
>
> 1. remove the "static".
>
> or
>
> 2. keep the static but set the lsn[] values after its declaration.
I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.
> 3 ====
>
> - /*
> - * Return false if not all the standbys have caught up to the specified
> - * WAL location.
> - */
> - if (caught_up_slot_num != standby_slot_names_config->nslotnames)
> - return false;
> + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
> + return true;
>
> lsn[] values are/(should have been, see 2 above) just been initialized to
> InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
> true. I think this check is not needed then.
Removed.
> Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> and standby_slot_names not empty?
I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.
Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'
latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)
Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)
Result with nothing set:
latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)
Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.
--
John Hsu - Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | John H | 2024-07-08 19:12:13 | Re: Allow logical failover slots to wait on synchronous replication |
Previous Message | Melanie Plageman | 2024-07-08 18:25:16 | Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |