Re: Restrict copying of invalidated replication slots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Restrict copying of invalidated replication slots
Date: 2025-02-25 18:58:39
Message-ID: CAD21AoDmzxF2vBiu1n_ceUgRoJ2u98JNUupQ=U84mAhg6Wv_wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 24, 2025 at 10:09 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, 25 Feb 2025 at 01:03, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've checked if this issue exists also on v15 or older, but IIUC it
> > doesn't exist, fortunately. Here is the summary:
> >
> > Scenario-1: the source gets invalidated before the copy function
> > fetches its contents for the first time. In this case, since the
> > source slot's restart_lsn is already an invalid LSN it raises an error
> > appropriately. In v15, we have only one slot invaldation reason, WAL
> > removal, therefore we always reset the slot's restart_lsn to
> > InvalidXlogRecPtr.
> >
> > Scenario-2: the source gets invalidated before the copied slot is
> > created (i.e., between first content copy and
> > create_logical/physical_replication_slot()). In this case, the copied
> > slot could have a valid restart_lsn value that however might point to
> > a WAL segment that might have already been removed. However, since
> > copy_restart_lsn will be an invalid LSN (=0), it's caught by the
> > following if condition:
> >
> > if (copy_restart_lsn < src_restart_lsn ||
> > src_islogical != copy_islogical ||
> > strcmp(copy_name, NameStr(*src_name)) != 0)
> > ereport(ERROR,
> > (errmsg("could not copy replication slot \"%s\"",
> > NameStr(*src_name)),
> > errdetail("The source replication slot was
> > modified incompatibly during the copy operation.")));
> >
> > Scenario-3: the source gets invalidated after creating the copied slot
> > (i.e. after create_logical/physical_replication_slot()). In this case,
> > since the newly copied slot have the same restart_lsn as the source
> > slot, both slots are invalidated.
> >
> > If the above analysis is right, I think the patches are mostly ready.
> > I've made some changes to the patches:
> >
> > - removed src_isinvalidated variable as it's not necessarily necessary.
> > - updated the commit message.
> >
> > Please review them. If there are no further comments on these patches,
> > I'm going to push them.
> >
> I have verified the above scenarios and I confirm the behaviour described.
>
> I have a small doubt.
> In PG_17 and PG_16 we can invalidate physical slots only for
> 'wal_removed' case [1]. And copying of such slot already give error
> 'cannot copy a replication slot that doesn't reserve WAL'. So, in PG17
> and PG16 should we check for invalidated source slot only if we are
> copying logical slots?

Although your analysis is correct, I believe we should retain the
validation check. Even though invalidated physical slots in PostgreSQL
16 and 17always have an invalid restart_lsn, maintaining this check
would be harmless. Furthermore, I prefer to maintain consistency in
the codebase across back branches and the master branch rather than
introducing variations.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim.Melnikov 2025-02-25 19:34:32 Spinlock can be released twice in procsignal.c
Previous Message Robert Haas 2025-02-25 18:52:28 Re: Trigger more frequent autovacuums of heavy insert tables