Quick Links

Re: Accept recovery conflict interrupt on blocked writing

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Accept recovery conflict interrupt on blocked writing
Date:	2025-01-17 18:03:35
Message-ID:	oxbhqicvvnozgddks7s7cxhichmp7pokxykpdrnhgutzgykj7e@ryf3wzyj7ads
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2025-01-17 17:01:53 +0100, Anthonin Bonnefoy wrote:
> I've cleaned up the tests: I've created a dedicated PgProto
> (definitely open to suggestions for a better name...) module
> containing all the helpers to send and receive messages on a raw
> socket in 0001.

Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That
has e.g. code to send a startup message.

> Also, I think the DoingBlockingWrite variable was unnecessary? Calling
> ProcessRecoveryConflictInterrupts should be enough as I don't think
> QueryCancelHoldoffCount can be >0 when writing on a socket it should always
> be able to cancel the statement.

I'd probably add an assertion to that effect, it's too easy for that to change
in a few years otherwise.

> Previously, all interrupts except process dying were ignored while a
> process was blocked writing to a socket. If the connection to the client
> was broken (no clean FIN nor RST), a process sending results to the
> client could be stuck for 924s until the TCP retransmission timeout is
> reached. During this time, it was possible for the process to conflict
> with recovery: For example, the process returning results can have a
> conflicting buffer pin.
>
> To avoid blocking recovery for an extended period of time, this patch
> changes client write interrupts by handling recovery conflict interrupts
> instead of ignoring them. Since the interrupt happens while we're likely
> to have partially written results on the socket, there's no easy way to
> keep protocol sync so the session needs to be terminated.

I don't see anything implementing the promotion of ERRORs to FATAL? You're
preventing the error message being sent to the client, but I don't think that
causes the connection to be terminated. The pre-existing code doesn't have
that problem, because it's only active when ProcDiePending is already set.

In fact, the test your patch added goes through
ProcessRecoveryConflictInterrupts() multiple times:

2025-01-17 12:52:47.842 EST [3376411] LOG: recovery still waiting after 20.709 ms: recovery conflict on buffer pin
2025-01-17 12:52:47.842 EST [3376411] CONTEXT: WAL redo at 0/3462288 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nun>
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
...
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl DETAIL: User was holding shared buffer pin for too long.
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl STATEMENT:
BEGIN;
DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
FETCH FORWARD FROM test_recovery_conflict_cursor;
SELECT generate_series(1, 100000);

backend 3376451> 2025-01-17 12:52:48.072 EST [3376411] LOG: recovery finished waiting after 250.681 ms: recovery conflict on buffer pin

I don't actually know why the conflict ends up being resolved after a bunch of
retries.

Note also the "backend>" (to which I added the PID to identify it) which gets
emitted. Just setting whereToSendOutput = DestNone has side effects when not
actually in a process exit status...

> @@ -581,6 +582,22 @@ ProcessClientWriteInterrupt(bool blocked)
> else
> SetLatch(MyLatch);
> }
> + else if (blocked && RecoveryConflictPending)
> + {
> + /*
> + * We're conflicting with recovery while being blocked writing. This
> + * can happen when the process is returning results and no ACK is
> + * received (broken connection, client overloaded...), eventually
> + * saturating the socket buffer while the process holds a page pin
> + * that eventually conflict with recovery.
> + */
> + if (InterruptHoldoffCount == 0 && CritSectionCount == 0)

I was about to say that we shouldn't get here with CritSectionCount != 0, but
unfortunately I'm not sure that's true. Thinking of e.g. debug messages or
such. And it's of course a copy of existing code...

Playing around with this it's unfortunately worse - we very commonly get to
ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
after processing recovery conflict interrupts. The reason for that is that if
we do throw an error the first thing we do is to hold interrupts
(c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().

Unfortunately I suspect that this means any protection we'd get from this
version of the patch is rather incomplete - if the recovery conflict is
triggered while not blocked, we'll ERROR out and report the message with
interrupts held. That cycle of recovery conflict signalling wouldn't ever be
resolved in such a case, I think. Of course we do re-signal recovery conflict
interrupts rather aggressively - but I'm not sure that's going to help
reliably.

I've previously suggested that we ought to report some errors to the client by
trying to send the data in a a non-blocking way, and just terminate if we
can't send the data immediately. IIRC Tom was concerned that that would lead
to too many lost messages. Perhaps we could instead implement this with a
fairly timer that terminates connections when stuck trying to write out an
error message?

> + {
> + if (whereToSendOutput == DestRemote)
> + whereToSendOutput = DestNone;
> + ProcessRecoveryConflictInterrupts();
> + }
> + }
>

This largely duplicates the existing code in
ProcessClientWriteInterrupt(). Perhaps it's worth flipping the order of
branches so that we first check that we're neither blocked nor have
InterruptHoldoffCount == 0 && CritSectionCount == 0?

Greetings,

Andres Freund

In response to

Re: Accept recovery conflict interrupt on blocked writing at 2025-01-17 16:01:53 from Anthonin Bonnefoy

Responses

Re: Accept recovery conflict interrupt on blocked writing at 2025-01-17 18:11:13 from Andres Freund
Re: Accept recovery conflict interrupt on blocked writing at 2025-01-17 18:40:15 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-01-17 18:11:13	Re: Accept recovery conflict interrupt on blocked writing
Previous Message	Fujii Masao	2025-01-17 17:58:19	Re: Add “FOR UPDATE NOWAIT” lock details to the log.