Fwd: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition

From: tender wang <tndrwang(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Fwd: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date: 2024-01-05 03:55:48
Message-ID: CAHewXNnUZHVZUggkO_0yp95GQaOUr57WCf_zar6-PxK5+6tghA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sorry, I forgot to cc other peoples emails. I only sent to Andres.

---------- Forwarded message ---------
发件人: tender wang <tndrwang(at)gmail(dot)com>
Date: 2024年1月5日周五 11:49
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after
no-space-left condition
To: Andres Freund <andres(at)anarazel(dot)de>

Andres Freund <andres(at)anarazel(dot)de> 于2024年1月5日周五 02:57写道:

> Hi,
>
> On 2023-12-28 14:36:46 +0800, Richard Guo wrote:
> > On Wed, Dec 27, 2023 at 5:08 PM tender wang <tndrwang(at)gmail(dot)com> wrote:
> >
> > > Alexander Lakhin <exclusion(at)gmail(dot)com> 于2023年12月27日周三 15:00写道:
> > >
> > >> I wonder, if "buf_state &= BM_VALID" is a typo here, maybe it
> supposed to
> > >> be
> > >> "buf_state &= ~BM_VALID" as in ExtendBufferedRelShared()...
> > >>
> > >
> > > Yeah, that's true. I analyze this issue again, and I think the root
> cause
> > > is the " buf_state &= BM_VALID" .
> > >
> >
> > Nice catch.
>
> Indeed.
>
> I wonder how this survived - I remember having written a script to locally
> test this scenario...
>
> Hm, I dimly remember having a hard time making my approach for this work
> for
> temp tables - IIRC used chattr +i to makr the file immutable, which only
> works
> for file descriptors opened after.
>
>
> > I believe the intention is to clear the BM_VALID bit.
>
> Indeed.
>
>
> These paths had many bugs over the years, they are hard to hit in testing
> but
> regularly hit in production. In a more modern world we'd have a unit test
> for
> these scenarios, in isolation. But that's hard at the moment.
>
> I have been wondering if it's worth and possible to write a C level test
> for
> them, perhaps in regress.c. But right now that seems hard because of the
> DropRelationBuffers() in smgrtruncate() (seems like a layering violation to
> have it there, but it's probably too risky to move at this point).
>
> But perhaps this could actually serve as a good first case for Michael's
> failure injection patch? A failure injection to make
> FileFallocate()/FileZero() fail should just be a few lines.
>

Yeah, agree! This is a great opportunity to introduce advanced testing
techniques such as Michael's fault injection.

We generally have pretty much no coverage for out-of-space, partial
> read/write, EINTR of file operations, because that's hard to do with the
> current test infrastructure. So this might be a nice first case.
>

AFAIK for out-of-space cases:
1、Checkpointer process and WalWriter process would exit, and the user's
session will be disconnected
2、restart instance will fail if it has redo logs to replay. This will
influence availability.

1 and 2 are not user-friendly.

> Greetings,
>
> Andres Freund
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2024-01-05 11:28:37 Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Previous Message Hayato Kuroda (Fujitsu) 2024-01-05 03:54:52 RE: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.