Quick Links

Re: failures in t/031_recovery_conflict.pl on CI

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: failures in t/031_recovery_conflict.pl on CI
Date:	2022-04-09 23:31:07
Message-ID:	20220409233107.v7okovwr46xsizfk@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2022-04-09 16:10:02 -0700, Andres Freund wrote:
> It's not that (although I still suspect it's a problem). It's a self-deadlock,
> because StandbyTimeoutHandler(), which ResolveRecoveryConflictWithBufferPin()
> *explicitly enables*, calls SendRecoveryConflictWithBufferPin(). Which does
> CancelDBBackends(). Which ResolveRecoveryConflictWithBufferPin() also calls,
> if the deadlock timeout is reached.
>
> To make it easier to hit, I put a pg_usleep(10000) in CancelDBBackends(), and boom:
>
> [... backtrace ... ]
>
> it's reproducible on linux.
>
>
> I'm lacking words I dare to put in an email to describe how bad an idea it is
> to call CancelDBBackends() from within a timeout function, particularly when
> the function enabling the timeout also calls that function itself. Before
> disabling timers.

It's been broken in different ways all the way back to 9.0, from what I can
see, but I didn't check every single version.

Afaics the fix is to nuke the idea of doing anything substantial in the signal
handler from orbit, and instead just set a flag in the handler. Then check
whether the timeout happend after ProcWaitForSignal() and call
SendRecoveryConflictWithBufferPin().

Also worth noting that the disable_all_timeouts() calls appears to break
STARTUP_PROGRESS_TIMEOUT.

Greetings,

Andres Freund

In response to

Re: failures in t/031_recovery_conflict.pl on CI at 2022-04-09 23:10:02 from Andres Freund

Responses

Re: failures in t/031_recovery_conflict.pl on CI at 2022-04-09 23:34:26 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2022-04-09 23:34:26	Re: failures in t/031_recovery_conflict.pl on CI
Previous Message	S.R Keshav	2022-04-09 23:16:32	GSOC: New and improved website for pgjdbc (JDBC) (2022)