From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Interrupts vs signals |
Date: | 2021-11-11 05:27:09 |
Message-ID: | CA+hUKGLseL7e7EFMZZECzCijjov76PcXEGYy2swXa8jfnuV=2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 21, 2021 at 8:27 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2021-10-21 07:55:54 +1300, Thomas Munro wrote:
> > I wonder if we really need signals to implement interrupts. Given
> > that they are non-preemptive/cooperative (work happens at the next
> > CFI()), why not just use shared memory flags and latches? That skips
> > a bunch of code, global variables and scary warnings about programming
> > in signal handlers.
>
> Depending on how you implement them, one difference could be whether / when
> "slow" system calls (recv, poll, etc) are interrupted.
Hopefully by now all such waits are implemented with latch.c facilities?
> Another is that that signal handling provides a memory barrier in the
> receiving process. For things that rarely change (like most interrupts), it
> can be more efficient to move the cost of that out-of-line, instead of
> incurring them on every check.
Agreed, but in this experiment I was trying out the idea that a memory
barrier is not really needed at all, unless you're about to go to
sleep. We already insert one of those before a latch wait. That is,
if we see !set->latch->is_set, we do pg_memory_barrier() and check
again, before sleeping, so your next CFI must see the flag. For
computation loops (sort, hash, query execution, ...), I speculate that
a relaxed read of memory is fine... you'll see the flag pretty soon,
and you certainly won't be allowed to finish your computation and go
to sleep.
> One nice thing of putting the state variables into shared memory would be that
> that'd allow to see the pending interrupts of other backends for debugging
> purposes.
+1
> > One idea is to convert those into "procsignals" too, for
> > consistency. In the attached, they can be set (ie by the same
> > backend) with ProcSignalRaise(), but it's possible that in future we
> > might have a reason for another backend to set them too, so it seems
> > like a good idea to have a single system, effectively merging the
> > concepts of "procsignals" and "interrupts".
>
> This seems a bit confusing to me. For one, we need to have interrupts working
> before we have a proc, IIRC. But leaving details like that aside, it just
> seems a bit backwards to me. I'm on board with other backends directly setting
> interrupt flags, but it seems to me that the procsignal stuff should be
> "client" of the process-local interrupt infrastructure, rather than the other
> way round.
Hmm. Yeah, I see your point. But I can also think of some arguments
for merging the concepts of local and shared interrupts; see below.
In this new sketch, I tried doing it the other way around. That is,
completely removing the concept of "ProcSignal", leaving only
"Interrupts". Initially, MyPendingInterrupts points to something
private, and once you're connected to shared memory it points to
MyProc->pending_interrupts.
> > +bool
> > +ProcSignalAnyPending(void)
> > +{
> > + volatile ProcSignalSlot *slot = MyProcSignalSlot;
> >
> > - if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
> > - RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
> > + /* XXX make this static inline? */
> > + /* XXX point to a dummy entry instead of using NULL to avoid a branch */
> > + return slot && slot->pss_signaled;
> > +}
>
> ISTM it might be easier to make this stuff efficiently race-free if we made
> this a count of pending operations.
Hmm, with a unified interrupt system and a bitmap it's not necessary
to have a separate flag/counter at all.
> > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void)
> > /* OK to accept any interrupts now? */
> > if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
> > return;
> > - InterruptPending = false;
> > + ProcSignalClearAnyPending();
> > +
> > + pg_read_barrier();
> >
> > - if (ProcDiePending)
> > + if (ProcSignalConsume(PROCSIG_DIE))
> > {
>
> I think making all of these checks into function calls isn't great. How about
> making the set of pending signals a bitmask? That'd allow to efficiently check
> a bunch of interrupts together and even where not, it'd just be a single test
> of the mask, likely already in a register.
+1.
Some assorted notes:
1. Aside from doing interrupts in this new way, I also have the
postmaster setting latches (!) instead of sending ad hoc SIGUSR1 here
and there. My main reason for doing that was to be able to chase out
all reasons to register a SIGUSR1 handler, so I could prove that
check-world passes. I like it, though. But maybe it's really a
separate topic.
2. I moved this stuff into interrupt.{h,c}. There is nothing left in
procsignal.c except code relating to ProcSignalBarrier. I guess that
thing could use another name, anyway. It's a ...
SystemInterruptBarrier?
3. Child-level SIGINT and SIGTERM handlers probably aren't really
necessary, either: maybe the sender could do
InterruptSend(INTERRUPT_{DIE,QUERY_CANCEL}, pgprocno) instead? But
perhaps people are attached to being able to send those signals from
external programs directly to backends.
4. Like the above, a SIGALRM handler might need to do eg
InterruptRaise(INTERRUPT_STATEMENT_TIMEOUT). That's a problem for
systems using spinlocks (self-deadlock against user context in
InterruptRaise()), so I'd need to come up with some flag protocol for
dinosaurs to make that safe, OR revert to having these "local only"
interrupts done with separate flags, as you were getting at earlier.
5. The reason I prefer to put currently "local only" interrupts into
the same atomic system is that I speculate that ultimately all of the
backend-level signal handlers won't be needed. They all fall into
three categories: (1) could be replaced with these interrupts
directly, (2) could be replaced by the new timer infrastructure that
multithreaded postgres would need to have to deliver interrupts to the
right recipients, (3) are quickdie and can be handled at the
containing process level. Then the only signal handlers left are top
level external ones.
But perhaps you're right and I should try reintroducing separate local
interrupts for now. I dunno, I like the simplicity of the unified
system; if only it weren't for those spinlock-backed atomics.
Attachment | Content-Type | Size |
---|---|---|
0001-Refactor-interrupt-system.patch | text/x-patch | 101.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2021-11-11 05:44:35 | Re: Improve logging when using Huge Pages |
Previous Message | Masahiko Sawada | 2021-11-11 05:09:42 | Re: Parallel vacuum workers prevent the oldest xmin from advancing |