Re: Accept recovery conflict interrupt on blocked writing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Accept recovery conflict interrupt on blocked writing
Date: 2025-01-17 18:40:15
Message-ID: kc5vfjvxg4kcliebgsxcvg4ka77bujocttxmlnsex4da7eyc3c@eof67pdoxg33
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-01-17 13:03:35 -0500, Andres Freund wrote:
> > Previously, all interrupts except process dying were ignored while a
> > process was blocked writing to a socket. If the connection to the client
> > was broken (no clean FIN nor RST), a process sending results to the
> > client could be stuck for 924s until the TCP retransmission timeout is
> > reached. During this time, it was possible for the process to conflict
> > with recovery: For example, the process returning results can have a
> > conflicting buffer pin.
> >
> > To avoid blocking recovery for an extended period of time, this patch
> > changes client write interrupts by handling recovery conflict interrupts
> > instead of ignoring them. Since the interrupt happens while we're likely
> > to have partially written results on the socket, there's no easy way to
> > keep protocol sync so the session needs to be terminated.
>
> I don't see anything implementing the promotion of ERRORs to FATAL? You're
> preventing the error message being sent to the client, but I don't think that
> causes the connection to be terminated. The pre-existing code doesn't have
> that problem, because it's only active when ProcDiePending is already set.

The easiest way to do that promotion is probably to set ExitOnAnyError to
true before entering ProcessRecoveryConflictInterrupts(). That does prevent
ending up in InteractiveBackend().

But I suspect this won't suffice:

> > @@ -581,6 +582,22 @@ ProcessClientWriteInterrupt(bool blocked)
> > else
> > SetLatch(MyLatch);
> > }
> > + else if (blocked && RecoveryConflictPending)
> > + {
> > + /*
> > + * We're conflicting with recovery while being blocked writing. This
> > + * can happen when the process is returning results and no ACK is
> > + * received (broken connection, client overloaded...), eventually
> > + * saturating the socket buffer while the process holds a page pin
> > + * that eventually conflict with recovery.
> > + */
> > + if (InterruptHoldoffCount == 0 && CritSectionCount == 0)
>
> I was about to say that we shouldn't get here with CritSectionCount != 0, but
> unfortunately I'm not sure that's true. Thinking of e.g. debug messages or
> such. And it's of course a copy of existing code...
>
> Playing around with this it's unfortunately worse - we very commonly get to
> ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
> after processing recovery conflict interrupts. The reason for that is that if
> we do throw an error the first thing we do is to hold interrupts
> (c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().
>
> Unfortunately I suspect that this means any protection we'd get from this
> version of the patch is rather incomplete - if the recovery conflict is
> triggered while not blocked, we'll ERROR out and report the message with
> interrupts held. That cycle of recovery conflict signalling wouldn't ever be
> resolved in such a case, I think. Of course we do re-signal recovery conflict
> interrupts rather aggressively - but I'm not sure that's going to help
> reliably.

This unfortunately indeed is true. If I, instead of the generate_series(),
add the following to the pgproto query:
DO \$\$BEGIN RAISE 'endless scream: %', repeat('a', 1000000); END;\$\$;

the conflict doesn't get handled.

Example log, with a fair bit of added debugging added:

2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl ERROR: endless scream: aa...
2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl CONTEXT: PL/pgSQL function inline_code_block line 1 at RAISE
2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl STATEMENT:
BEGIN;
DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
FETCH FORWARD FROM test_recovery_conflict_cursor;
DO $$BEGIN RAISE 'endless scream: %', repeat('a', 1000000); END;$$;

3407842: start fronted error
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(8192) = 8192: 0/Success
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(992023) = 219264: 0/Success
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:46.231 EST [3407800] DEBUG: reaping dead processes
2025-01-17 13:31:46.232 EST [3407800] DEBUG: releasing pm child slot 3
2025-01-17 13:31:46.232 EST [3407800] DEBUG: client backend (PID 3407844) exited with exit code 0
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sendtime 2025-01-17 13:31:46.407326-05 receipttime 2025-01-17 13:31:46.40737-05 replication apply delay 0 ms transfer latency 1 ms
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460DA0 apply 0/3460DA0
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407832: !HoldingBufferPinThatDelaysRecovery
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:46.707 EST [3407803] LOG: recovery still waiting after 300.099 ms: recovery conflict on buffer pin
2025-01-17 13:31:46.707 EST [3407803] CONTEXT: WAL redo at 0/3460DA0 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nunused: 1, unused: [21]; blkref #0: rel 1663/16385/16386, blk 0
3407832: !HoldingBufferPinThatDelaysRecovery
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407842: write(772759) = -1: 11/Resource temporarily unavailable
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:56.417 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
2025-01-17 13:32:06.427 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
2025-01-17 13:32:07.224 EST [3407804] DEBUG: sendtime 2025-01-17 13:32:07.224295-05 receipttime 2025-01-17 13:32:07.224413-05 replication apply delay 20817 ms transfer latency 1 ms

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-01-17 19:02:15 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Tom Lane 2025-01-17 18:27:36 Re: SQLFunctionCache and generic plans