md.c vs elog.c vs smgrreleaseall() in barrier

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: md.c vs elog.c vs smgrreleaseall() in barrier
Date: 2025-03-13 16:03:35
Message-ID: 3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

If there is any LOG/DEBUG elog inside functions like mdreadv(), mdwritev(),
mdextend(), be it directly or indirectly, the functions become unsafe. The
problem is that at the end of errfinish() there is a CFI:

/*
* Check for cancel/die interrupt first --- this is so that the user can
* stop a query emitting tons of notice or warning messages, even if it's
* in a loop that otherwise fails to check for interrupts.
*/
CHECK_FOR_INTERRUPTS();

that CFI in turn can call
ProcessProcSignalBarrier() ->
ProcessBarrierSmgrRelease() ->
smgrreleaseall()

which will free the MdfdVec that was opened at the start of mdreadv() etc.

I of course originally hit this with AIO, but it's not really AIO specific. If
I #define FDDEBUG I can quickly trigger crashes in master too. E.g.

(gdb) bt
#4 0x00005563a1bcb266 in ExceptionalCondition (conditionName=0x5563a10e2aae "FileIsValid(file)",
fileName=0x5563a10e24c8 "../../../../../home/andres/src/postgresql/src/backend/storage/file/fd.c", lineNumber=2485)
at ../../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
#5 0x00005563a198d10b in FilePathName (file=-917820408) at ../../../../../home/andres/src/postgresql/src/backend/storage/file/fd.c:2485
#6 0x00005563a19cd40a in mdextend (reln=0x5563c952d568, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fff13dda000, skipFsync=true)
at ../../../../../home/andres/src/postgresql/src/backend/storage/smgr/md.c:505
#7 0x00005563a19d0848 in smgrextend (reln=0x5563c952d568, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fff13dda000, skipFsync=true)
at ../../../../../home/andres/src/postgresql/src/backend/storage/smgr/smgr.c:541
#8 0x00005563a1981800 in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM, permanent=true)
at ../../../../../home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:4667
#9 0x00005563a1981b8f in CreateAndCopyRelationData (src_rlocator=..., dst_rlocator=..., permanent=true)
at ../../../../../home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:4760
#10 0x00005563a16561c7 in CreateDatabaseUsingWalLog (src_dboid=4, dst_dboid=1414794, src_tsid=1663, dst_tsid=1663)
at ../../../../../home/andres/src/postgresql/src/backend/commands/dbcommands.c:216
#11 0x00005563a16594de in createdb (pstate=0x5563c942f8b0, stmt=0x5563c9476fd8)
at ../../../../../home/andres/src/postgresql/src/backend/commands/dbcommands.c:1522
#12 0x00005563a19e05b2 in standard_ProcessUtility (pstmt=0x5563c9477068,

Right now I don't really see a solution other than putting
HOLD_INTERRUPTS()/RESUME_INTERRUPTS() into all those functions.

If we go for that, I can see an argument for doing that in smgr.c instead of
md.c, afaict any plausible smgr implementation would run into this, as the
smgrcloseall() is triggered from the smgr level.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-03-13 16:23:08 Re: [PATCH] Add reverse(bytea)
Previous Message Bertrand Drouvot 2025-03-13 15:38:33 Re: Draft for basic NUMA observability