From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-02-17 20:07:56 |
Message-ID: | CAD21AoCPc+pEgb0pJeiS2CU39ad8VW-10Ze7Uii=1RRjfgQ0uw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Feb 14, 2025 at 12:17:48AM -0800, Masahiko Sawada wrote:
> > On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > Looking at the latest custodian worker patch, the basic architecture
> > is to have a single custodian worker and processes can ask it for some
> > work such as removing logical decoding related files. The online
> > wal_level change will be the one of the tasks that processes (eps.
> > checkpointer) can ask for it. On the other hand, one point that I
> > think might not fit this wal_level work well is that while the
> > custodian worker is a long-lived worker process,
>
> That was the case initialy but it looks like it would not have been the case
> at the end. See, Tom's comment in [1]:
>
> "
> I wonder if a single long-lived custodian task is the right model at all.
> At least for RemovePgTempFiles, it'd make more sense to write it as a
> background worker that spawns, does its work, and then exits,
> independently of anything else
> "
>
> > it's sufficient for
> > the online wal_level change work to have a bgworker that does its work
> > and then exits.
>
> Fully agree and I did not think about changing this behavior.
>
> > IOW, from the perspective of this work, I prefer the
> > idea of having one short-lived worker for one task over having one
> > long-lived worker for multiple tasks.
>
> Yeah, or one short-lived worker for multiple tasks could work too. It just
> starts when it has something to do and then exit.
>
> > Reading that thread, while we
> > need to resolve the XID wraparound issue for the work of removing
> > logical decoding related files, the work of removing temporary files
> > seems to fit a short-lived worker style. So I thought as one of the
> > directions, it might be worth considering to have an infrastructure
> > where we can launch a bgworker just for one task, and we implement the
> > online wal_level change and temporary files removal on top of it.
>
> Yeap, that was exactly my point when I mentioned the custodian thread (taking
> into account Tom's comment quoted above).
>
I've written PoC patches to have the online wal_level change work use
a more generic infrastructure. These patches are still in PoC state
but seem like a good direction to me. Here is a brief explanation for
each patch.
* The 0001 patch introduces "reserved background worker slots". We
allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
if the number of running bgworker exceeds max_worker_processes, only
workers using the reserved slots can be launched. We can request to
use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
bgworker registration.
* The 0002 patch introduces "bgtask worker". The bgtask infrastructure
is designed to execute internal tasks in background in
one-worker-per-one-task style. Internally, bgtask workers use the
reserved bgworker so it's guaranteed that they can launch. The
internal tasks that we can request are predefined and this patch has a
dummy task as a placeholder. This patch implements only the minimal
functionality for the online wal_level change work. I've not tested if
this bgtask infrastructure can be used for tasks that we wanted to
offload to the custodian worker.
* The 0003 patch makes wal_level a SIGHUP parameter. We do the online
wal_level change work using the bgtask infrastructure. There are no
major changes from the previous version other than that.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v3-0003-PoC-Convert-wal_level-a-PGC_SIGHUP-parameter.patch | application/x-patch | 49.9 KB |
v3-0002-Introduce-bgtask-infrastructure-to-perform-tasks-.patch | application/x-patch | 12.5 KB |
v3-0001-Introduce-reserved-background-worker-slots.patch | application/x-patch | 19.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-02-17 20:13:00 | Re: BUG #18815: Logical replication worker Segmentation fault |
Previous Message | David G. Johnston | 2025-02-17 20:03:30 | Re: UUID v7 |