Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-02-12 07:44:38
Message-ID: Z6xRZmnbE/3hFNB/@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Feb 11, 2025 at 02:11:10PM -0800, Masahiko Sawada wrote:
> I've updated the patch that includes comment updates and bug fixes.

Thanks!

> The main idea of changing WAL level online is to decouple two aspects:
> (1) the information included in WAL records and (2) the
> functionalities available at each WAL level. With that, we can change
> the WAL level gradually. For example, when increasing the WAL level
> from 'replica' to 'logical', we first switch the WAL level on the
> shared memory to a new higher level where we allow processes to write
> WAL records with additional information required by the logical
> decoding, while keeping the logical decoding unavailable. The new
> level is something between 'replica' and 'logical'. Once we confirm
> all processes have synchronized to the new level, we increase the WAL
> level further to 'logical', allowing us to start logical decoding. The
> patch supports all combinations of WAL level transitions. It makes
> sense to me to use a background worker to proceed with this transition
> work since we need to wait at some points, rather than delegating it
> to the checkpointer process.

The background worker being added is "wal_level control worker". I wonder if
it would make sense to create a more "generic" one instead (to whom we could
assign more "tasks" later on, as suggested in the past in [1]).

+ /*
+ * XXX: Perhaps it's not okay that we failed to launch a bgworker and give
+ * up wal_level change because we already reported that the change has
+ * been accepted. Do we need to use aux process instead for that purpose?
+ */
+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ ereport(WARNING,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("out of background worker slots"),
+ errhint("You might need to increase \"%s\".", "max_worker_processes")));

Not sure it has to be an aux process instead as it should be busy in rare occasions.

Maybe we could add some mechanism for ensuring that a bgworker slot is available
when needed (as suggested in [2])?

Not saying it has to be done that way. I just thought that the "wal_level control worker"
could be a perfect use case/starting point for a more generic one but I don't want
to over complicate that thread though.

So maybe just rename "wal_level control worker" to say "custodian worker" and
we could also think about [2]? Feel free to consider all of this as Nits if you
feel it deviates too much from the initial intend of this thread.

[1]: https://www.postgresql.org/message-id/flat/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com
[2]: https://www.postgresql.org/message-id/1058306.1680467858%40sss.pgh.pa.us

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2025-02-12 07:45:49 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Previous Message Hayato Kuroda (Fujitsu) 2025-02-12 07:26:47 RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.