Re: Accept recovery conflict interrupt on blocked writing

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Accept recovery conflict interrupt on blocked writing
Date: 2025-01-16 13:24:41
Message-ID: CAO6_XqrhfCVzAPZFMYhRR-gapJ45=FZnTLXxeQOu4M9Z+R1BWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bonjour Thomas,

On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Right. Before commit 0da096d7 in v17, the recovery conflict code
> running in a signal handler would have set ProcDiePending, so this
> looks like an unintended regression due to that commit.

The issue is happening on instances with 14.13. Though I haven't tried
to reproduce it yet on a version older than the current HEAD yet.

> In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
> it could also service other interrupts that we probably don't want to
> introduce without more study, no? For example if
> ProcessRecoveryConflictInterrupts() returns without throwing, and
> another pending interrupt flag is also set.
>
> Thinking of other proc signals/interrupts, of course it's not really
> acceptable that we don't process at least also global barriers here,
> ie for a time controllable by a client, but let's call that an
> independent problem. I think the minimal thing for this specific
> problem might be to use ProcessRecoveryConflictInterrupts() directly.

I think that's definitely happening, my first version used a
pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
I've replaced it by a direct call to
ProcessRecoveryConflictInterrupts().

> The read from the socket also seems to have a variant of the problem,
> unless I'm missing something, except in that case I'm not sure it's
> new. For sending you're proposing that our only choice is to kill the
> session, which makes sense, but the equivalent read side stay-in-sync
> strategy is to keep deferring until the client gets around to sending
> a complete message, if I read that right.

I'm not sure the read path has the same issue. That would require a
replica to do a non command read but I don't think it's possible? A
recovering instance can't run COPY, and is there another situation
where a replica would do a blocking read without the DoingCommandRead
flag? Also, the read side is also killing the session since
DoingCommandRead will be true most of the time, falling through to the
FATAL case.

> Hmm, I wonder if we could write tests for both directions in
> 031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
> as seen in some other recent proposals... I might have a crack at
> that soon unless you want to.

I've tried adding a test for this. I've implemented some utility
functions to start a session, read a payload and send a simple query
using raw TCP. The implementation is definitely a bit rough as I'm far
from being fluent in perl. The socket utility functions should
probably be moved to a dedicated file and the Windows build is
currently failing. However, the test does trigger the issue with the
recovery blocked by the session blocked in ClientWrite.

Attachment Content-Type Size
v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patch application/octet-stream 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2025-01-16 13:25:43 Re: NOT ENFORCED constraint feature
Previous Message Matheus Alcantara 2025-01-16 13:07:39 Re: SCRAM pass-through authentication for postgres_fdw