RE: Perform streaming logical transactions by background workers and parallel apply

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-05-08 03:52:44
Message-ID: OS0PR01MB57165D850B0262D13141D63E94719@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, May 8, 2023 11:08 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>

Hi,

>
> On Tue, May 2, 2023 at 12:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > While investigating this issue, I've reviewed the code around
> > > callbacks and worker termination etc and I found a problem.
> > >
> > > A parallel apply worker calls the before_shmem_exit callbacks in the
> > > following order:
> > >
> > > 1. ShutdownPostgres()
> > > 2. logicalrep_worker_onexit()
> > > 3. pa_shutdown()
> > >
> > > Since the worker is detached during logicalrep_worker_onexit(),
> > > MyLogicalReplication->leader_pid is an invalid when we call
> > > pa_shutdown():
> > >
> > > static void
> > > pa_shutdown(int code, Datum arg)
> > > {
> > > Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> > > SendProcSignal(MyLogicalRepWorker->leader_pid,
> > > PROCSIG_PARALLEL_APPLY_MESSAGE,
> > > InvalidBackendId);
> > >
> > > Also, if the parallel apply worker fails shm_toc_lookup() during the
> > > initialization, it raises an error (because of noError = false) but
> > > ends up a SEGV as MyLogicalRepWorker is still NULL.
> > >
> > > I think that we should not use MyLogicalRepWorker->leader_pid in
> > > pa_shutdown() but instead store the leader's pid to a static variable
> > > before registering pa_shutdown() callback.
> > >
> >
> > Why not simply move the registration of pa_shutdown() to someplace
> > after logicalrep_worker_attach()?
>
> If we do that, the worker won't call dsm_detach() if it raises an
> ERROR in logicalrep_worker_attach(), is that okay? It seems that it's
> no practically problem since we call dsm_backend_shutdown() in
> shmem_exit(), but if so why do we need to call it in pa_shutdown()?

I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach
callbacks to give callback a chance to report stat before the stat system is
shutdown, following what we do in ParallelWorkerShutdown() (e.g.
sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so we
need to fire that earlier).

But for parallel apply, we currently only have one on_dsm_detach
callback(shm_mq_detach_callback) which doesn't report extra stats. So the
dsm_detach in pa_shutdown is only used to make it a bit future-proof in case
we add some other on_dsm_detach callbacks in the future which need to report
stats.

Best regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-05-08 03:57:33 Re: PGDOCS - Replica Identity quotes
Previous Message Peter Smith 2023-05-08 03:38:53 Re: [DOC] Update ALTER SUBSCRIPTION documentation