Quick Links

Re: Accept recovery conflict interrupt on blocked writing

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Accept recovery conflict interrupt on blocked writing
Date:	2025-01-16 13:24:41
Message-ID:	CAO6_XqrhfCVzAPZFMYhRR-gapJ45=FZnTLXxeQOu4M9Z+R1BWA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Bonjour Thomas,

On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Right. Before commit 0da096d7 in v17, the recovery conflict code
> running in a signal handler would have set ProcDiePending, so this
> looks like an unintended regression due to that commit.

The issue is happening on instances with 14.13. Though I haven't tried
to reproduce it yet on a version older than the current HEAD yet.

> In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
> it could also service other interrupts that we probably don't want to
> introduce without more study, no? For example if
> ProcessRecoveryConflictInterrupts() returns without throwing, and
> another pending interrupt flag is also set.
>
> Thinking of other proc signals/interrupts, of course it's not really
> acceptable that we don't process at least also global barriers here,
> ie for a time controllable by a client, but let's call that an
> independent problem. I think the minimal thing for this specific
> problem might be to use ProcessRecoveryConflictInterrupts() directly.

I think that's definitely happening, my first version used a
pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
I've replaced it by a direct call to
ProcessRecoveryConflictInterrupts().

> The read from the socket also seems to have a variant of the problem,
> unless I'm missing something, except in that case I'm not sure it's
> new. For sending you're proposing that our only choice is to kill the
> session, which makes sense, but the equivalent read side stay-in-sync
> strategy is to keep deferring until the client gets around to sending
> a complete message, if I read that right.

I'm not sure the read path has the same issue. That would require a
replica to do a non command read but I don't think it's possible? A
recovering instance can't run COPY, and is there another situation
where a replica would do a blocking read without the DoingCommandRead
flag? Also, the read side is also killing the session since
DoingCommandRead will be true most of the time, falling through to the
FATAL case.

> Hmm, I wonder if we could write tests for both directions in
> 031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
> as seen in some other recent proposals... I might have a crack at
> that soon unless you want to.

I've tried adding a test for this. I've implemented some utility
functions to start a session, read a payload and send a simple query
using raw TCP. The implementation is definitely a bit rough as I'm far
from being fluent in perl. The socket utility functions should
probably be moved to a dedicated file and the Windows build is
currently failing. However, the test does trigger the issue with the
recovery blocked by the session blocked in ClientWrite.

Attachment	Content-Type	Size
v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patch	application/octet-stream	11.6 KB

In response to

Re: Accept recovery conflict interrupt on blocked writing at 2025-01-15 05:20:53 from Thomas Munro

Responses

Re: Accept recovery conflict interrupt on blocked writing at 2025-01-17 16:01:53 from Anthonin Bonnefoy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amul Sul	2025-01-16 13:25:43	Re: NOT ENFORCED constraint feature
Previous Message	Matheus Alcantara	2025-01-16 13:07:39	Re: SCRAM pass-through authentication for postgres_fdw