Re: md.c vs elog.c vs smgrreleaseall() in barrier

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: md.c vs elog.c vs smgrreleaseall() in barrier
Date: 2025-03-19 22:45:20
Message-ID: af3gdqg3guov3flyze2gjjphrrfbnusjesverlknn4tpxtafkh@vfpat6ffutek
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-19 12:55:53 -0700, Noah Misch wrote:
> On Mon, Mar 17, 2025 at 07:52:02PM -0400, Andres Freund wrote:
> > Here's a proposed patch for this. It turns out that the bug might already be
> > reachable, even without defining FDDEBUG. There's a debug ereport() in
> > register_dirty_segment() - but it's hard to reach in practice.
> >
> > I don't really know whether that means we ought to backpatch or not - which
> > makes me want to be conservative and not backpatch.
>
> Non-backpatch sounds fine.

Cool.

> > @@ -362,12 +397,16 @@ smgrreleaseall(void)
> > if (SMgrRelationHash == NULL)
> > return;
> >
> > + HOLD_INTERRUPTS();
>
> Likely not important, but it's not clear to me why smgrdestroyall() and
> smgrreleaseall() get HOLD_INTERRUPTS(), as opposed to relying on the holds in
> smgrdestroy() and smgrrelease(). In contrast, smgrreleaserellocator() does
> rely on smgrrelease() for the hold.

It didn't seem particularly safe to allow interrupts, which in turn could
change the list of open relations, while iterating over a linked list / a
hashtable.

> > @@ -434,6 +481,8 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
> > if (nrels == 0)
> > return;
> >
> > + HOLD_INTERRUPTS();
> > +
> > FlushRelationsAllBuffers(rels, nrels);
>
> FlushRelationsAllBuffers() isn't part of smgr or md.c, so it's unlikely to
> become sensitive to smgrrelease(). It may do a ton of I/O. Hence, I'd
> HOLD_INTERRUPTS() after FlushRelationsAllBuffers(), not before.

Hm - we never would want to process interrupts while in
FlushRelationsAllBuffers() or such, would we? I'm ok with changing it, I
guess I just didn't see a reason not to use a wider scope.

I guess I am a bit paranoid because gave me flashbacks to issues around
smgrtruncate() failing after doing DropRelationBuffers(), that Thomas recently
fixed (and I had worked on a few years before). But of course
DropRelationBuffers() is way more dangerous than FlushRelationsAllBuffers().

> > }
> > }
> > +
> > + RESUME_INTERRUPTS();
> > }
> >
> > /*
> > @@ -471,6 +522,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
> > if (nrels == 0)
> > return;
> >
> > + HOLD_INTERRUPTS();
>
> I would move this below DropRelationsAllBuffers(), for reasons like
> FlushRelationsAllBuffers() above.

I think that'd be unsafe. Once we dropped buffers from the buffer pool we
can't just continue without also unlinking the underlying relation, otherwise
an older view of the data later can be "revived from the dead" after an error,
causing all manner of corruption.

I suspect it's always called with interrupts held already though.

But unfortunately I think this probably needs to be done in a critical
section, not just run with interrupts held.

We really really shouldn't ever palloc() after doing something like
DropRelationsAllBuffers(). Thomas just spent a lot of time fixing corruption
issues arising for related issues around relation truncations...

I think this may mean that an error during smgr_unlink() leads to a cluster in
a corrupted state?

> > Subject: [PATCH v2 2/2] smgr: Make SMgrRelation initialization safer against
> > errors
> >
> > In case the smgr_open callback failed, the ->pincount field would not be
> > initialized and the relation would not be put onto the unpinned_relns list.
> >
> > This buglet was introduced in 21d9c3ee4ef7. As that commit is only in HEAD, no
> > need to backpatch.
> >
> > Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
> > ---
> > src/backend/storage/smgr/smgr.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
> > index 53a09fe4aaa..24971304b85 100644
> > --- a/src/backend/storage/smgr/smgr.c
> > +++ b/src/backend/storage/smgr/smgr.c
> > @@ -255,12 +255,12 @@ smgropen(RelFileLocator rlocator, ProcNumber backend)
> > reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
> > reln->smgr_which = 0; /* we only have md.c at present */
> >
> > - /* implementation-specific initialization */
> > - smgrsw[reln->smgr_which].smgr_open(reln);
> > -
> > /* it is not pinned yet */
> > reln->pincount = 0;
> > dlist_push_tail(&unpinned_relns, &reln->node);
> > +
> > + /* implementation-specific initialization */
> > + smgrsw[reln->smgr_which].smgr_open(reln);
> > }
>
> I struggle to speculate about the merits of this, because mdopen() can't fail.
> If mdopen() started to do things that could fail, mdnblocks() would be
> reasonable in assuming those things are done. Hence, the long-term direction
> should be more like destroying the new smgr entry in the event of error.
>
> I would not make this change. I'd maybe add a comment that smgr_open
> callbacks currently aren't allowed to fail, since smgropen() isn't ready to
> clean up smgr-level state from a failed open. How do you see it?

I see no disadvantage in the change - it seems strictly better to initialize
pincount earlier. I agree that it might be a good idea to explicitly handle
errors eventually, but that'd not be made harder by this change...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafael Thofehrn Castro 2025-03-19 22:53:05 Re: Proposal: Progressive explain
Previous Message Corey Huinker 2025-03-19 22:35:52 Re: Statistics Import and Export