From: | Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Accept recovery conflict interrupt on blocked writing |
Date: | 2025-01-16 13:24:41 |
Message-ID: | CAO6_XqrhfCVzAPZFMYhRR-gapJ45=FZnTLXxeQOu4M9Z+R1BWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Bonjour Thomas,
On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Right. Before commit 0da096d7 in v17, the recovery conflict code
> running in a signal handler would have set ProcDiePending, so this
> looks like an unintended regression due to that commit.
The issue is happening on instances with 14.13. Though I haven't tried
to reproduce it yet on a version older than the current HEAD yet.
> In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
> it could also service other interrupts that we probably don't want to
> introduce without more study, no? For example if
> ProcessRecoveryConflictInterrupts() returns without throwing, and
> another pending interrupt flag is also set.
>
> Thinking of other proc signals/interrupts, of course it's not really
> acceptable that we don't process at least also global barriers here,
> ie for a time controllable by a client, but let's call that an
> independent problem. I think the minimal thing for this specific
> problem might be to use ProcessRecoveryConflictInterrupts() directly.
I think that's definitely happening, my first version used a
pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
I've replaced it by a direct call to
ProcessRecoveryConflictInterrupts().
> The read from the socket also seems to have a variant of the problem,
> unless I'm missing something, except in that case I'm not sure it's
> new. For sending you're proposing that our only choice is to kill the
> session, which makes sense, but the equivalent read side stay-in-sync
> strategy is to keep deferring until the client gets around to sending
> a complete message, if I read that right.
I'm not sure the read path has the same issue. That would require a
replica to do a non command read but I don't think it's possible? A
recovering instance can't run COPY, and is there another situation
where a replica would do a blocking read without the DoingCommandRead
flag? Also, the read side is also killing the session since
DoingCommandRead will be true most of the time, falling through to the
FATAL case.
> Hmm, I wonder if we could write tests for both directions in
> 031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
> as seen in some other recent proposals... I might have a crack at
> that soon unless you want to.
I've tried adding a test for this. I've implemented some utility
functions to start a session, read a payload and send a simple query
using raw TCP. The implementation is definitely a bit rough as I'm far
from being fluent in perl. The socket utility functions should
probably be moved to a dedicated file and the Windows build is
currently failing. However, the test does trigger the issue with the
recovery blocked by the session blocked in ClientWrite.
Attachment | Content-Type | Size |
---|---|---|
v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patch | application/octet-stream | 11.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amul Sul | 2025-01-16 13:25:43 | Re: NOT ENFORCED constraint feature |
Previous Message | Matheus Alcantara | 2025-01-16 13:07:39 | Re: SCRAM pass-through authentication for postgres_fdw |