Quick Links

Re: md.c vs elog.c vs smgrreleaseall() in barrier

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date:	2025-03-13 22:31:47
Message-ID:	CA+hUKG+HLL+Q6Q04R1UJc9fYo=z2ymZFHf0HJWat7-PdFCb+Gw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 14, 2025 at 5:03 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> If there is any LOG/DEBUG elog inside functions like mdreadv(), mdwritev(),
> mdextend(), be it directly or indirectly, the functions become unsafe. The
> problem is that at the end of errfinish() there is a CFI:
>
> /*
> * Check for cancel/die interrupt first --- this is so that the user can
> * stop a query emitting tons of notice or warning messages, even if it's
> * in a loop that otherwise fails to check for interrupts.
> */
> CHECK_FOR_INTERRUPTS();
>
> that CFI in turn can call
> ProcessProcSignalBarrier() ->
> ProcessBarrierSmgrRelease() ->
> smgrreleaseall()
>
> which will free the MdfdVec that was opened at the start of mdreadv() etc.

> Right now I don't really see a solution other than putting
> HOLD_INTERRUPTS()/RESUME_INTERRUPTS() into all those functions.

Some other ideas:

1. An smgr-local smgr-release-hold flag, which would just cause
ProcessBarrierSmgrReleaseAll() to return false. Main complication
seems to be making sure you reset it on error, which is a bit
annoying, elog.c probably needs to know how to clean it up. Yuck.

2. Somehow tell elog.c not to call CHECK_FOR_INTERRUPTS() instead.
Also yuck, but at least elog.c is already the right place to clean
state up on non-local exit...

3. Considering errfinish()'s stated goal, a sort of backstop to help
you cancel looping message-spewing code only, I wonder if we should
just restrict the kinds of interrupts it processes, so that it only
calls CFI() if we're definitely throwing ERROR or FATAL? Something
like CHECK_FOR_CANCEL_OR_DIE() which would enter the regular code only
if it can already determine that, which would probably be defined as
something like: if (INTERRUPTS_PENDING_CONDITION() &&
INTERRUPTS_CAN_BE_PROCESSED() && (QueryCancelPending ||
ProcDiePending)) ProcessInterrupts(). I realise that what I'm
describing is essentially what CHECK_FOR_INTERRUPTS() used to be like,
before a bunch of non-cancel, non-die stuff moved into
CHECK_FOR_INTERRUPTS(), but that's probably the conditions under which
that code was written. Could anything else be accidentally on elog()
CFIs? Seems pretty weird?

4. Bigger refactoring of the interrupts system so you can express
more precisely what kinds of stuff you want to handle. Well, we have
some stuff like that coming in the nearby procsignal/interrupt
refactoring thread. I don't feel enthusiastic about messing with it
in the back branches, hence easier suggestion in #3.

> If we go for that, I can see an argument for doing that in smgr.c instead of
> md.c, afaict any plausible smgr implementation would run into this, as the
> smgrcloseall() is triggered from the smgr level.

Seems like maybe not a great idea to assume that future smgrs will be
OK with being non-interruptible, if it can be avoided?

In response to

md.c vs elog.c vs smgrreleaseall() in barrier at 2025-03-13 16:03:35 from Andres Freund

Responses

Re: md.c vs elog.c vs smgrreleaseall() in barrier at 2025-03-13 23:23:02 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Abhishek Chanda	2025-03-13 23:02:35	Re: Adding support for SSLKEYLOGFILE in the frontend
Previous Message	Daniel Gustafsson	2025-03-13 22:07:33	Re: Adding support for SSLKEYLOGFILE in the frontend