Re: Accept recovery conflict interrupt on blocked writing

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Accept recovery conflict interrupt on blocked writing
Date: 2025-01-15 05:20:53
Message-ID: CA+hUKGKQqrKttbVe18LC1=cHpyRD_mAmqukmDXokP92SoUG4Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bonjour Anthonin,

On Mon, Jan 13, 2025 at 11:31 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
> To avoid blocking recovery for an extended period of time, this patch
> changes client write interrupts by handling recovery conflict
> interrupts instead of ignoring them. Since the interrupt happens while
> we're likely to have partially written results on the socket, there's
> no easy way to keep protocol sync so the session needs to be
> terminated.

Right. Before commit 0da096d7 in v17, the recovery conflict code
running in a signal handler would have set ProcDiePending, so this
looks like an unintended regression due to that commit.

In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
it could also service other interrupts that we probably don't want to
introduce without more study, no? For example if
ProcessRecoveryConflictInterrupts() returns without throwing, and
another pending interrupt flag is also set.

Thinking of other proc signals/interrupts, of course it's not really
acceptable that we don't process at least also global barriers here,
ie for a time controllable by a client, but let's call that an
independent problem. I think the minimal thing for this specific
problem might be to use ProcessRecoveryConflictInterrupts() directly.

I understand why you created DoingBlockingWrite, based on the variable
named DoingQueryRead. I wonder if it would be better to model it on
QueryCancelHoldoffCount, inventing QueryCancelPromoteToFatalCount, but
I'm not sure.

The read from the socket also seems to have a variant of the problem,
unless I'm missing something, except in that case I'm not sure it's
new. For sending you're proposing that our only choice is to kill the
session, which makes sense, but the equivalent read side stay-in-sync
strategy is to keep deferring until the client gets around to sending
a complete message, if I read that right.

Hmm, I wonder if we could write tests for both directions in
031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
as seen in some other recent proposals... I might have a crack at
that soon unless you want to.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2025-01-15 05:34:51 Re: Allow NOT VALID foreign key constraints on partitioned tables.
Previous Message Amul Sul 2025-01-15 04:50:59 Re: CREATE TABLE NOT VALID for check and foreign key