Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-08-08 05:01:43
Message-ID: CAJpy0uDicCvuQ8LxCvzjL6dAR=HUKqaOdnhoc6v+41M5vuWiUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On 8/4/23 1:32 PM, shveta malik wrote:
> > On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
>
> >> Sorry to be late, but I gave a second thought and I wonder if we really need this design.
> >> (i.e start a logical replication background worker on the standby to sync the slots).
> >>
> >> Wouldn't that be simpler to "just" update the sync slots "metadata"
> >> as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by Peter
> >> up-thread) is doing?
> >> (making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot()
> >> and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() correctly).
> >>
> >
> > Agreed. It would be simpler to just update the metadata. I think you
> > have not got chance to review the latest posted patch ('v10-0003')
> > yet, it does the same.
>
> Thanks for the feedback! Yeah, I did not look at v10 in details and was
> looking at the email thread only.
>
> Indeed, I now see that 0003 does update the metadata in local_slot_advance(),
> that's great!
>
> >
> > But I do not quite get it as in how can we do it w/o starting a
> > background worker?
>
> Yeah, agree that we still need background workers.
> What I meant was to avoid to use "logical replication background worker"
> (aka through logicalrep_worker_launch()) to sync the slots.
>

Agreed. That is why in v10,v11 patches, we have different infra for
sync-slot worker i.e. it is not relying on "logical replication
background worker" anymore.

> > The question here is how many background workers we
> > need to have. Will one be sufficient or do we need one per db (as done
> > earlier by the original patches in this thread) or are we good with
> > dividing work among some limited number of workers?
> >
> > I feel syncing all slots in one worker may increase the lag between
> > subsequent syncs for a particular slot and if the number of slots are
> > huge, the chances of losing the slot-data is more in case of failure.
> > Starting one worker per db also might not be that efficient as it will
> > increase load on the system (both in terms of background worker and
> > network traffic) especially for a case where the number of dbs are
> > more. Thus starting max 'n' number of workers where 'n' is decided by
> > GUC and dividing the work/DBs among these looks a better option to me.
> > Please see the discussion in and around the email at [1]
> >
> > [1]: https://www.postgresql.org/message-id/CAJpy0uCT%2BnpL4eUvCWiV_MBEri9ixcUgJVDdsBCJSqLd0oD1fQ%40mail.gmail.com
>
> Thanks for the link! If I read the email thread correctly, this discussion
> was before V10 (which is the first version making use of LogicalConfirmReceivedLocation(),
> LogicalIncreaseXminForSlot(), LogicalIncreaseRestartDecodingForSlot()) means
> before the metadata sync only has been implemented.
>
> While I agree that the approach to split the sync load among workers with the new
> max_slot_sync_workers GUC seems reasonable without the sync only feature (pre V10),
> I'm not sure that with the metadata sync only in place the extra complexity to manage multiple
> sync workers is needed.
>
> Maybe we should start some tests/benchmark with only one sync worker to get numbers
> and start from there?

Yes, we can do that performance testing to figure out the difference
between the two modes. I will try to get some statistics on this.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xiaoran Wang 2023-08-08 05:11:20 [PATCH] update the comment in SnapshotSetCommandId
Previous Message Rui Zhao 2023-08-08 04:35:38 Re: pg_upgrade fails with in-place tablespace