From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Date: | 2025-03-17 23:52:02 |
Message-ID: | kpopzahilmjtcpeefrsbrtzvwddzbhadsfn3cvdelnpi24gbgq@5ty2xbzoyua3 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-14 12:57:51 +1300, Thomas Munro wrote:
> On Fri, Mar 14, 2025 at 12:23 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > > 3. Considering errfinish()'s stated goal, a sort of backstop to help
> > > you cancel looping message-spewing code only, I wonder if we should
> > > just restrict the kinds of interrupts it processes, so that it only
> > > calls CFI() if we're definitely throwing ERROR or FATAL?
> >
> > I'm not even sure it'd be safe to ERROR out in all the relevant places...
> >
> > E.g.
> > /* implementation-specific initialization */
> > smgrsw[reln->smgr_which].smgr_open(reln);
> >
> > /* it is not pinned yet */
> > reln->pincount = 0;
> > dlist_push_tail(&unpinned_relns, &reln->node);
> >
> > doesn't this mean that ->pincount is uninitialized in case smgr_open() errors
> > out and that we'd loose track of the reln because it wasn't yet added to
> > unpinned_rels?
>
> Ugh, right.
Patch for that attached.
> > > > If we go for that, I can see an argument for doing that in smgr.c instead of
> > > > md.c, afaict any plausible smgr implementation would run into this, as the
> > > > smgrcloseall() is triggered from the smgr level.
> > >
> > > Seems like maybe not a great idea to assume that future smgrs will be
> > > OK with being non-interruptible, if it can be avoided?
> >
> > I think you'd need a fairly large surgery of smgr.c to make that viable - I
> > rather doubt that smgr.c itself is actually safe against interrupts.
> >
> > I can see some arguable uses of smgr handling interrupts, say an smgr
> > implementation based on a network backed store, but you'd need rather massive
> > changes to actually be sure that smgr.c is called while accepting interrupts -
> > e.g. today sgmrwritev() will largely be called with interrupts held. Plenty
> > reads too. I doubt that undoing a few HOLD_INTERRUPTS() is a meaningful part
> > of the necessary work.
>
> Right, exactly the case I was thinking of: if some hypothetical future
> network smgr wants to be able to process *some* kinds of things
> carefully while waiting for the network. I don't think we want to
> constrain ourselves to NFS-style "pretend it's local and make it
> non-interruptible" without any escape hatches, but you're quite right
> that that's probably a whole research project of its own and we just
> haven't refined the interrupt system enough for that yet, so yeah I
> see how you arrived at that conclusion and it makes sense.
Here's a proposed patch for this. It turns out that the bug might already be
reachable, even without defining FDDEBUG. There's a debug ereport() in
register_dirty_segment() - but it's hard to reach in practice.
I don't really know whether that means we ought to backpatch or not - which
makes me want to be conservative and not backpatch.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
v2-0001-smgr-Hold-interrupts-in-most-smgr-functions.patch | text/x-diff | 10.2 KB |
v2-0002-smgr-Make-SMgrRelation-initialization-safer-again.patch | text/x-diff | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2025-03-17 23:52:05 | Re: RFC: Allow EXPLAIN to Output Page Fault Information |
Previous Message | Masahiko Sawada | 2025-03-17 23:44:26 | Re: Separate GUC for replication origins |