Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()
Date: 2021-10-11 10:33:57
Message-ID: CALj2ACVKqmD6+wcWe0VFyqmbwh2vY_n-zzLSCDXEYXn5ro5g8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 11, 2021 at 12:41 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > > These are the following ways I think we can fix it, if at all some
> > > other hacker agrees that it is actually an issue:
> > > 1) Fix the startup process code, probably by unregistering the
> > > procsignal array entry that was made with ProcSignalInit(MaxBackends +
> > > MyAuxProcType + 1) in AuxiliaryProcessMain() and register with
> > > ProcSignalInit(MyBackendId) immediately after SharedInvalBackendInit()
> > > calculates the MyBackendId in with SharedInvalBackendInit() in
> > > InitRecoveryTransactionEnvironment(). This seems risky to me as
> > > unregistering and registering ProcSignal array involves some barriers
> > > and during the registering and unregistering window, the startup
> > > process may miss the SIGUSR1.
> > > 2) Ensure that the process, that wants to send the startup process
> > > SIGUSR1 signal, doesn't use the backendId from the startup process
> > > PGPROC, in which case it has to loop over all the entries of
> > > ProcSignal->psh_slot[] array to find the entry with the startup
> > > process PID. It seems easier and less riskier but only caveat is that
> > > the sending process shouldn't look at the backendId from auxiliary
> > > process PGPROC, instead it should just traverse the entire proc signal
> > > array to find the right slot.
> > > 3) Add a comment around AuxiliaryPidGetProc() that says "auxiliary
> > > processes don't have valid backend ids, so don't use the backendId
> > > from the returned PGPROC".
> > > (2) and (3) seem reasonable to me. Thoughts?
>
> (I'm not sure how the trouble happens.)

The order in which SharedInvalBackendInit and ProcSignalInit should be
used to keep procState and ProcSignal array entries for backend and
auxiliary processes is as follows:
call SharedInvalBackendInit() and let it calculate the MyBackendId
call ProcSignalInit(MyBackendId);

But for the startup process it does the opposite way, so the procState
and ProcSignal array entries are not in sync.
call ProcSignalInit(MaxBackends + MyAuxProcType + 1);
call SharedInvalBackendInit() and let it calculate the MyBackendId

If some process wants to send the startup process SIGUSR1 with the
PGPROC->backendId and SendProcSignal(PGPROC->pid, XXXXX,
PGPROC->backendId) after getting the PGPROC entry from
AuxiliaryPidGetProc(), then the signal isn't sent. To understand this
issue, please use a sample patch at [1], have a standby setup, call
pg_log_backend_memory_contexts with startup process pid on the
standby, the error "could not send signal to process" is shown, see
[2].

[1]
diff --git a/src/backend/utils/adt/mcxtfuncs.c
b/src/backend/utils/adt/mcxtfuncs.c
index 0d52613bc3..2739591edc 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -185,6 +185,10 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)

proc = BackendPidGetProc(pid);

+ /* see if the given process is an auxiliary process. */
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
/*
* BackendPidGetProc returns NULL if the pid isn't valid; but
by the time
* we reach kill(), a process for which we get a valid proc here might

[2]
postgres=# select pg_log_backend_memory_contexts(726901);
WARNING: could not send signal to process 726901: No such process
pg_log_backend_memory_contexts
--------------------------------
f
(1 row)

> > How about modifying SharedInvalBackendInit() so that it accepts
> > BackendId as an argument and allocates the ProcState entry of
> > the specified BackendId? That is, the startup process determines
> > that its BackendId is "MaxBackends + MyAuxProcType (=StartupProcess) +
> > 1"
> > in AuxiliaryProcessMain(), and then it passes that BackendId to
> > SharedInvalBackendInit() in InitRecoveryTransactionEnvironment().
> >
> > Maybe you need to enlarge ProcState array so that it also handles
> > auxiliary processes if it does not for now.
>
> It seems to me that the backendId on startup process is used only for
> vxid generation. Actually "make check-world" doesn't fail by skipping
> SharedInvalBackendinit() (and disabling an assertion).
>
> I thought that we could decouple vxid from backend ID (or sinval code)
> by using pgprocno for vxid generation instead of backend ID. "make
> check-world" doesn't fail with that change, too. (I don't think
> "doesn't fail" ncecessarily mean that that change is correct, though),
> but vxid gets somewhat odd after the change..
>
> =# select distinct virtualxid from pg_locks;
> virtualxid
> ------------
>
> 116/1 # startup
> 99/48 # backend 1

I'm not sure we go that path. Others may have better thoughts.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2021-10-11 10:42:32 Re: Multi-Column List Partitioning
Previous Message Markus Winand 2021-10-11 10:22:41 Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails