Re: Interrupts vs signals

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Interrupts vs signals
Date: 2024-07-08 09:38:22
Message-ID: 3b996d73-2a10-4072-a58d-0386af0cbb78@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/07/2024 05:56, Thomas Munro wrote:
> Here's an updated version of this patch.
>
> The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno)
> becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending
> interrupt global variables and pss_procsignalFlags[] go away, along
> with the SIGUSR1 handler. The interrupts are compressed into a single
> bitmap. See commit message for more details.
>
> The patch is failing on Windows CI for reasons I haven't debugged yet,
> but I wanted to share what I have so far. Work in progress!
>
> Here is my attempt to survey the use of signals and write down what I
> think we might do about them all so far, to give the context for this
> patch:
>
> https://wiki.postgresql.org/wiki/Signals
>
> Comments, corrections, edits very welcome.

Nice, thanks!

> Background worker state notifications are also changed from raw
> kill(SIGUSR1) to SetLatch(). That means that SetLatch() is now called
> from the postmaster. The main purpose of including that change is to be
> able to remove procsignal_sigusr1_handler completely and set SIGUSR1 to
> SIG_IGN, and show the system working.
>
> XXX Do we need to invent SetLatchRobust() that doesn't trust anything in
> shared memory, to be able to set latches from the postmaster?

The patch actually does both: it still does kill(SIGUSR1) and also sets
the latch.

I think it would be nice if RegisterDynamicBackgroundWorker() had a
"bool notify_me" argument, instead of requiring the caller to set
"bgw_notify_pid = MyProcPid" before the call. That's a
backwards-compatibility break, but maybe we should bite the bullet and
do it. Or we could do this in RegisterDynamicBackgroundWorker():

if (worker->bgw_notify_pid == MyProcPid)
worker->bgw_notify_pgprocno = MyProcNumber;

I think we can forbid setting pgw_notify_pid to anything else than 0 or
MyProcPid.

A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it
can do any damage if you called it on a pointer to garbage, except if
the pointer itself is bogus, then just dereferencing it an cause a
segfault. So it would be nice to have a version specifically designed
with that in mind. For example, it could assume that the Latch's pid is
never legally equal to MyProcPid, because postmaster cannot own any latches.

Another approach would be to move the responsibility of background
worker state notifications out of postmaster completely. When a new
background worker is launched, the worker process itself could send the
notification that it has started. And similarly, when a worker exits, it
could send the notification just before exiting. There's a little race
condition with exiting: if a process is waiting for the bgworker to
exit, and launches a new worker immediately when the old one exits,
there will be a brief period when the old and new process are alive at
the same time. The old worker wouldn't be doing anything interesting
anymore since it's exiting, but it still counts towards
max_worker_processes, so launching the new process might fail because of
hitting the limit. Maybe we should just bump up max_worker_processes. Or
postmaster could check PMChildFlags and not count processes that have
already deregistered from PMChildFlags towards the limit.

> -volatile uint32 InterruptHoldoffCount = 0;
> -volatile uint32 QueryCancelHoldoffCount = 0;
> -volatile uint32 CritSectionCount = 0;
> +uint32 InterruptHoldoffCount = 0;
> +uint32 QueryCancelHoldoffCount = 0;
> +uint32 CritSectionCount = 0;

I wondered if these are used in PG_TRY-CATCH blocks in a way that would
still require volatile. I couldn't find any such usage by some quick
grepping, so I think we're good, but I thought I'd mention it.

> +/*
> + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and
> + * ProcessInterrupts() handle. These perform work that is safe to run whenever
> + * interrupts are not "held". Other kinds of interrupts are only handled at
> + * more restricted times.
> + */
> +#define INTERRUPT_STANDARD_MASK \

Some interrupts are missing from this mask:

- INTERRUPT_PARALLEL_APPLY_MESSAGE
- INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT
- INTERRUPT_SINVAL_CATCHUP
- INTERRUPT_NOTIFY

Is that on purpose?

> -/*
> - * Because backends sitting idle will not be reading sinval events, we
> - * need a way to give an idle backend a swift kick in the rear and make
> - * it catch up before the sinval queue overflows and forces it to go
> - * through a cache reset exercise. This is done by sending
> - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
> - *
> - * The signal handler will set an interrupt pending flag and will set the
> - * processes latch. Whenever starting to read from the client, or when
> - * interrupted while doing so, ProcessClientReadInterrupt() will call
> - * ProcessCatchupEvent().
> - */
> -volatile sig_atomic_t catchupInterruptPending = false;

Would be nice to move that comment somewhere else rather than remove it
completely.

> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -444,6 +444,10 @@ InitProcess(void)
> OwnLatch(&MyProc->procLatch);
> SwitchToSharedLatch();
>
> + /*We're now ready to accept interrupts from other processes. */
> + pg_atomic_init_u32(&MyProc->pending_interrupts, 0);
> + SwitchToSharedInterrupts();
> +
> /* now that we have a proc, report wait events to shared memory */
> pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>
> @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void)
> OwnLatch(&MyProc->procLatch);
> SwitchToSharedLatch();
>
> + /* We're now ready to accept interrupts from other processes. */
> + SwitchToSharedInterrupts();
> +
> /* now that we have a proc, report wait events to shared memory */
> pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>

Is there a reason for the different initialization between a regular
backend and aux process?

> +/*
> + * Switch to shared memory interrupts. Other backends can send interrupts
> + * to this one if they know its ProcNumber.
> + */
> +void
> +SwitchToSharedInterrupts(void)
> +{
> + pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts));
> + MyPendingInterrupts = &MyProc->pending_interrupts;
> +}

Hmm, I think there's a race condition here (and similarly in
SwitchToLocalInterrupts), if the signal handler runs between the
pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe
something like this instead:

MyPendingInterrupts = &MyProc->pending_interrupts;
pg_memory_barrier();
pg_atomic_fetch_or_u32(&MyProc->pending_interrupts,
pg_atomic_read_u32(LocalPendingInterrupts));

And perhaps this should also clear LocalPendingInterrupts, just to be tidy.

> @@ -138,7 +139,8 @@
> typedef struct ProcState
> {
> /* procPid is zero in an inactive ProcState array entry. */
> - pid_t procPid; /* PID of backend, for signaling */
> + pid_t procPid; /* pid of backend */
> + ProcNumber pgprocno; /* for sending interrupts */
> /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */
> int nextMsgNum; /* next message number to read */
> bool resetState; /* backend needs to reset its state */

We can easily remove procPid altogether now that we have pgprocno here.
Similarly with the pid/pgprocno in ReplicationSlot and WalSndState.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-07-08 09:41:29 RE: Conflict Detection and Resolution
Previous Message Anthonin Bonnefoy 2024-07-08 09:35:19 Re: Use pgBufferUsage for block reporting in analyze