Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-04-21 17:31:03
Message-ID: CAD21AoAA=zuiajwXgXSCYQWo=6oY-=CGLaEqvpfNUTVsLen+CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2025 at 10:05 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Feb 19, 2025 at 1:56 AM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
>
> Thank you for looking at the patches.
>
> >
> > On Mon, Feb 17, 2025 at 12:07:56PM -0800, Masahiko Sawada wrote:
> > > On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > > Yeap, that was exactly my point when I mentioned the custodian thread (taking
> > > > into account Tom's comment quoted above).
> > > >
> > >
> > > I've written PoC patches to have the online wal_level change work use
> > > a more generic infrastructure. These patches are still in PoC state
> > > but seem like a good direction to me. Here is a brief explanation for
> > > each patch.
> >
> > Thanks for the patches!
> >
> > > * The 0001 patch introduces "reserved background worker slots". We
> > > allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
> > > if the number of running bgworker exceeds max_worker_processes, only
> > > workers using the reserved slots can be launched. We can request to
> > > use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
> > > bgworker registration.
> >
> > I had a quick look at 0001 and I think the way that's implemented is reasonnable.
> > I thought this could be defined through a GUC so that extensions can benefit
> > from it. But OTOH the core code should ensure the value is > as the number of
> > reserved slots needed by the core so not using a GUC looks ok to me.
>
> Interesting idea. I kept the reserved slots only for internal use but
> it would be worth considering to use GUC instead.
>
> > > * The 0002 patch introduces "bgtask worker". The bgtask infrastructure
> > > is designed to execute internal tasks in background in
> > > one-worker-per-one-task style. Internally, bgtask workers use the
> > > reserved bgworker so it's guaranteed that they can launch.
> >
> > Yeah.
> >
> > > The
> > > internal tasks that we can request are predefined and this patch has a
> > > dummy task as a placeholder. This patch implements only the minimal
> > > functionality for the online wal_level change work. I've not tested if
> > > this bgtask infrastructure can be used for tasks that we wanted to
> > > offload to the custodian worker.
> >
> > Again, I had a quick look and looks simple enough of our need here. It "just"
> > executes "(void) InternalBgTasks[type].func()" and then exists. That's, I think,
> > a good starting point to add more tasks in the future (if we want to).
>
> Yeah, we might want to extend it further, for example to pass an
> argument to the background task or to ask multiple tasks for the
> single bgtask worker. As far as I can read the custodian patch set,
> the work of removing temp files seems not to require any argument
> though.
>
> >
> > > * The 0003 patch makes wal_level a SIGHUP parameter. We do the online
> > > wal_level change work using the bgtask infrastructure. There are no
> > > major changes from the previous version other than that.
> >
> > It replaces the dummy task introduced in 0002 by the one that suits our needs
> > here (through the new BgTaskWalLevelChange() function).
> >
> > The design looks reasonable to me. Waiting to see if others disagree before
> > looking more closely at the code.
>
> Thanks.

I would like to discuss behavioral and user interface considerations.

Upon further analysis of this patch regarding the conversion of
wal_level to a SIGHUP parameter, I find that supporting all
combinations of wal_level value changes might make less sense.
Specifically, changing to or from 'minimal' would necessitate a
checkpoint, and reducing wal_level to 'minimal' would require
terminating physical replication, WAL archiving, and online backups.
While these operations demand careful consideration, there seems to be
no compelling use case for decreasing to 'minimal'. Furthermore,
increasing wal_level from 'minimal' is typically a one-time operation
during a database's lifetime. Therefore, we should weigh the benefits
against the implementation complexity.

One solution is to manage the effective WAL level using two distinct
GUC parameters: max_wal_level and wal_level. max_wal_level would be a
POSTMASTER parameter controlling the system's maximum allowable WAL
level, with values 'minimal', 'replica', and 'logical'. wal_level
would function as a SIGHUP parameter managing the runtime WAL level,
accepting values 'replica', 'logical', and 'auto'. The selected value
must be either 'auto' or not exceed max_wal_level. When set to 'auto',
wal_level automatically synchronizes with max_wal_level's value. This
approach would enable online WAL level transitions between 'replica'
and 'logical'.

Regarding logical decoding on standbys, currently both primary and
standby servers must have wal_level set to 'logical'. We need to
determine the appropriate behavior when users decrease the WAL level
from 'logical' to 'replica' through configuration file reload.

One approach would be to invalidate all logical replication slots on
the standby when transitioning to 'replica' WAL level. Although
incoming WAL records from the primary would still be written at
'logical' level, making logical decoding technically feasible, this
behavior seems logical as it reflects the user's intent to discontinue
logical decoding on the standby. For consistency, we might need to
invalidate logical slots during server startup if the WAL level is
insufficient.

Alternatively, we could permit logical decoding on the standby even
with wal_level set to 'replica'. However, this would necessitate
invalidating all logical replication slots during promotion,
potentially extending downtime during failover.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-04-21 18:20:00 Re: jsonapi: scary new warnings with LTO enabled
Previous Message Jacob Champion 2025-04-21 16:57:33 Re: [PoC] Federated Authn/z with OAUTHBEARER