Re: Allow logical failover slots to wait on synchronous replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: John H <johnhyvr(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow logical failover slots to wait on synchronous replication
Date: 2024-07-29 03:11:37
Message-ID: CAJpy0uCRefcBGz_2goD50YiKnk6bOikYaGg6JUwyxJtMwmYvEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 26, 2024 at 5:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr(at)gmail(dot)com> wrote:
> > > >
> > > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> > > > > and standby_slot_names not empty?
> > > >
> > > > I didn't think 'standby_slot_names' would impact TPS as much since
> > > > it's not grabbing the SyncRepLock but here's a quick test.
> > > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > > > pgbench all running from the same server.
> > > >
> > > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> > > >
> > > > Result with: standby_slot_names =
> > > > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> > > >
> > > > latency average = 5.600 ms
> > > > latency stddev = 2.854 ms
> > > > initial connection time = 5.503 ms
> > > > tps = 714.148263 (without initial connection time)
> > > >
> > > > Result with: standby_slot_names_from_syncrep = 'true',
> > > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > > >
> > > > latency average = 5.740 ms
> > > > latency stddev = 2.543 ms
> > > > initial connection time = 4.093 ms
> > > > tps = 696.776249 (without initial connection time)
> > > >
> > > > Result with nothing set:
> > > >
> > > > latency average = 5.090 ms
> > > > latency stddev = 3.467 ms
> > > > initial connection time = 4.989 ms
> > > > tps = 785.665963 (without initial connection time)
> > > >
> > > > Again I think it's possible to improve the synchronous numbers if we
> > > > cache but I'll try that out in a bit.
> > > >
> > >
> > > Okay, so the tests done till now conclude that we won't get the
> > > benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> > > increase the number of standby's in both lists and still keep ANY 3 in
> > > synchronous_standby_names then the results may vary. We should try to
> > > find out if there is a performance benefit with the use of
> > > synchronous_standby_names in the normal configurations like the one
> > > you used in the above tests to prove the value of this patch.
> > >
> >
> > I didn't fully understand the parameters mentioned above, specifically
> > what 'latency stddev' and 'latency average' represent.. But shouldn't
> > we see the benefit/value of this patch by having a setup where a
> > particular standby is slow in sending the response back to primary
> > (could be due to network lag or other reasons) and then measuring the
> > latency in receiving changes on failover-enabled logical subscribers?
> > We can perform this test with both of the below settings and say make
> > D and E slow in sending responses:
> > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.
> >
>
> Yes, I also expect the patch should perform better in such a scenario
> but it is better to test it. Also, irrespective of that, we should
> investigate why the reported case is slower for
> synchronous_standby_names and see if we can improve it.

+1

> BTW, you for 2), I think you wanted to say synchronized_standby_slots,
> not standby_slot_names. We have recently changed the GUC name.

yes, sorry, synchronized_standby_slots it is.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-07-29 03:26:09 Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)
Previous Message Richard Guo 2024-07-29 03:03:23 Re: Simplify create_merge_append_path a bit for clarity