Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition

From: tender wang <tndrwang(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date: 2024-01-03 06:46:06
Message-ID: CAHewXN=KYgBa0ER5JMNV=rGFU=Jes0sdiQFkdfq6mvBmi5WENA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Michael Paquier <michael(at)paquier(dot)xyz> 于2023年12月31日周日 14:38写道:

> On Thu, Dec 28, 2023 at 02:36:46PM +0800, Richard Guo wrote:
> > On Wed, Dec 27, 2023 at 5:08 PM tender wang <tndrwang(at)gmail(dot)com> wrote:
>
> (Tender Wang.. I know somebody with this name while doing Postgres
> stuff in the past, are you the same guy?)
>

Sorry, I should not be the guy you know. I started using this name to
send emails in the community only since last year.

>> Yeah, that's true. I analyze this issue again, and I think the root
> cause
> >> is the " buf_state &= BM_VALID" .
> >
> > Nice catch. I believe the intention is to clear the BM_VALID bit.
>
> Hmm. Yeah. I'd need to think about that a bit more carefully, but it
> looks like you are right here.. Nice catch. That looks like an
> unfortunate typo from 31966b151e6a when local buffers are handled.
>
>
I would like to share a further analysis of this issue.
This issue will not happen if we have enough disk space. Because
LocalBufHash will not have an entry which is goning to be the extending
block.
So in ExtendBufferedRelLocal(), the found variable will always be false
when we search LocalBufHash. Then 'if (found) {...} 'branch will have no
change to enter,
if disk has enough space. So, this issue had never been exposed before.

What happen when disk have limited space?
At a certain moment, smgrzeroextend() will report error because of no disk
space, but the tag has already been inserted into the LocalBufHash.
And we don't have remove the tag. So we will enter the 'if (found) {...}'
branch when we have enough disk space, which will set the buf_state wrong
status.

> > By the way, I wonder if it would be worthwhile to define new macros for
> > bit operations such as set_bit, clear_bit, test_bit, and so on, so that
> > we can avoid such typos in the future.
>
> Not sure. This is hiding the code behind more layers without bringing
> extra value if you know how to read bitwise operations, which is
> something I'd expect somebody to get pretty good at when hacking on
> Postgres because that's all around the code tree. And this would
> create conflict noises when backpatching. Hence, -1.
>

We had better add more comments to eplain when enter if(found) {...}
branch, and when enter the ‘else {...}' branch in
ExtendBufferedRelLocal().

> --
> Michael
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2024-01-03 11:01:03 Re: Postgres 16.1 - Bug: cache entry already complete
Previous Message Andrei Lepikhov 2024-01-03 05:56:16 Re: BUG #18261: Inconsistent results of SELECT affected by joined subqueries