Re: md.c vs elog.c vs smgrreleaseall() in barrier

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date: 2025-03-19 23:06:33
Message-ID: 5glf3oa3c7wsaqwjywzdarutwir6bdk4zqaua3ue7kki7mjvbo@rdrmkvhmyhyj
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-19 18:45:20 -0400, Andres Freund wrote:
> On 2025-03-19 12:55:53 -0700, Noah Misch wrote:
> > On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote:
> > > @@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
> > > if (nrels == 0)
> > > return;
> > >
> > > + HOLD_INTERRUPTS();
> >
> > I would move this below DropRelationsAllBuffers(), for reasons like
> > FlushRelationsAllBuffers() above.
>
> I think that'd be unsafe. Once we dropped buffers from the buffer pool we
> can't just continue without also unlinking the underlying relation, otherwise
> an older view of the data later can be "revived from the dead" after an error,
> causing all manner of corruption.
>
> I suspect it's always called with interrupts held already though.
>
> But unfortunately I think this probably needs to be done in a critical
> section, not just run with interrupts held.
>
> We really really shouldn't ever palloc() after doing something like
> DropRelationsAllBuffers(). Thomas just spent a lot of time fixing corruption
> issues arising for related issues around relation truncations...
>
> I think this may mean that an error during smgr_unlink() leads to a cluster in
> a corrupted state?

Ah - it effectively is already in a critical section, just a weirdly spelled one:

2025-03-19 19:00:06.398 EDT [2156613][client backend][0/3:0][psql] LOG: statement: DROP TABLE foo;
2025-03-19 19:00:06.404 EDT [2156613][client backend][0/0:43139][psql] ERROR: muahahaha
2025-03-19 19:00:06.404 EDT [2156613][client backend][0/0:43139][psql] STATEMENT: DROP TABLE foo;
2025-03-19 19:00:06.404 EDT [2156613][client backend][0/0:43139][psql] WARNING: AbortTransaction while in COMMIT state
2025-03-19 19:00:06.404 EDT [2156613][client backend][0/0:43139][psql] PANIC: cannot abort transaction 43139, it was already committed

Obviously not great, but better than corruption.

Until the IO issue preventing smgrdounlinkall() to work are fixed, the DB
doesn't start up again...

2025-03-19 19:00:57.711 EDT [2156761][startup][:0][] FATAL: muahahaha
2025-03-19 19:00:57.711 EDT [2156761][startup][:0][] CONTEXT: WAL redo at B/66D6E650 for Transaction/COMMIT: 2025-03-19 19:00:06.400748-04; rels: base/5/25449; dropped stats: 2/5/25449; inval msgs: ...
2025-03-19 19:00:57.715 EDT [2156758][postmaster][:0][] LOG: startup process (PID 2156761) exited with exit code 1
2025-03-19 19:00:57.715 EDT [2156758][postmaster][:0][] LOG: terminating any other active server processes

If one uses a large s_b and the system is busy, it's not even that unlikely
that we would fail in that spot:
mdunlinkfork()->
register_forget_request()->
RegisterSyncRequest()->
ForwardSyncRequest()->
CompactCheckpointerRequestQueue()

can require a NBuffers * sizeof(bool) array and hashtable that fits all the
entries...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-03-19 23:11:28 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Rafael Thofehrn Castro 2025-03-19 22:53:05 Re: Proposal: Progressive explain