Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-27 08:51:56
Message-ID: CAJpy0uA2M0DmUMRJ6VZkcuPWdgnwd6m5jGqfiBG4Y6Nm6dumiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 25, 2023 at 3:15 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On 10/25/23 5:00 AM, shveta malik wrote:
> > On Tue, Oct 24, 2023 at 11:54 AM Drouvot, Bertrand
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> On 10/23/23 2:56 PM, shveta malik wrote:
> >>> On Mon, Oct 23, 2023 at 5:52 PM Drouvot, Bertrand
> >>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>
> >>>> We are waiting for DEFAULT_NAPTIME_PER_CYCLE (3 minutes) before checking if there
> >>>> is new synced slot(s) to be created on the standby. Do we want to keep this behavior
> >>>> for V1?
> >>>>
> >>>
> >>> I think for the slotsync workers case, we should reduce the naptime in
> >>> the launcher to say 30sec and retain the default one of 3mins for
> >>> subscription apply workers. Thoughts?
> >>>
> >>
> >> Another option could be to keep DEFAULT_NAPTIME_PER_CYCLE and create a new
> >> API on the standby that would refresh the list of sync slot at wish, thoughts?
> >>
> >
> > Do you mean API to refresh list of DBIDs rather than sync-slots?
> > As per current design, launcher gets DBID lists for all the failover
> > slots from the primary at intervals of DEFAULT_NAPTIME_PER_CYCLE.
>
> I mean an API to get a newly created slot on the primary being created/synced on
> the standby at wish.
>
> Also let's imagine this scenario:
>
> - create logical_slot1 on the primary (and don't start using it)
>
> Then on the standby we'll get things like:
>
> 2023-10-25 08:33:36.897 UTC [740298] LOG: waiting for remote slot "logical_slot1" LSN (0/C00316A0) and catalog xmin (752) to pass local slot LSN (0/C0049530) and and catalog xmin (754)
>
> That's expected and due to the fact that ReplicationSlotReserveWal() does set the slot
> restart_lsn to a value < at the corresponding restart_lsn slot on the primary.
>
> - create logical_slot2 on the primary (and start using it)
>
> Then logical_slot2 won't be created/synced on the standby until there is activity on logical_slot1 on the primary
> that would produce things like:
>
> 2023-10-25 08:41:35.508 UTC [740298] LOG: wait over for remote slot "logical_slot1" as its LSN (0/C005FFD8) and catalog xmin (756) has now passed local slot LSN (0/C0049530) and catalog xmin (754)
>
> With this new dedicated API, it will be:
>
> - clear that the API call is "hanging" until there is some activity on the newly created slot
> (currently there is "waiting for remote slot " message in the logfile as mentioned above but
> I'm not sure that's enough)
>
> - be possible to create/sync logical_slot2 in the example above without waiting for activity
> on logical_slot1.
>
> Maybe we should change our current algorithm during slot creation so that a newly created inactive
> slot on the primary does not block other newly created "active" slots on the primary to be created
> on the standby? Depending on how we implement that, the new API may not be needed at all.
>
> Thoughts?
>

I discussed this with my colleague Hou-San and we think that one
possibility could be to somehow accelerate the increment of
restart_lsn on primary. This can be achieved by connecting to the
remote and executing pg_log_standby_snapshot() at reasonable intervals
while waiting on standby during slot creation. This may increase speed
to a reasonable extent w/o having to wait for the user or bgwriter to
do the same for us. The current logical decoding uses a similar
approach to speed up the slot creation. I refer to usage of
LogStandbySnapshot in SnapBuildWaitSnapshot() and
ReplicationSlotReserveWal()).
Thoughts?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2023-10-27 08:53:12 Invalid Path with UpperRel
Previous Message Daniel Gustafsson 2023-10-27 08:50:02 Re: pg_upgrade's object listing