Re: EvictUnpinnedBuffer and buffer free list

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: EvictUnpinnedBuffer and buffer free list
Date: 2025-02-12 05:47:47
Message-ID: CAExHW5vR_m=KddwNyo-59vm_8E_kAZgKe19VyC-aHYK_mK8Dfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot Melanie for a very detailed response, a good reference to pin.

On Fri, Jan 31, 2025 at 8:20 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:

>
> I don't have an explicit issue with EvictUnpinnedBuffer() putting
> buffers on the freelist -- it seems like that could be fine. But since
> it is for testing/development, I don't see what benefits it will have
> to users. It sounds like you saw issues when developing -- what kinds
> of issues?

I can't say it was issue, may be an expectation mismatch. In my
experiment where the entire buffer pool was full, I was expecting the
evicted buffer to be available immediately for the next page request.
I didn't expect another eviction. I kinda thought that the buffer was
lost, but it was returned. The next thing I tried was to evict many
buffers together using EvictUnpinnedBuffer() and those buffers took a
long time to return to the pool because clock sweep wasn't as fast as
the eviction. But that's not a regular scenario, so may be current
behaviour is okay to avoid the lock contention.

>
> > The prologue of function InvalidateVictimBuffer() says "/* Helper
> > routine for GetVictimBuffer() ". I believe that it's expected that the
> > buffer will be allocated to some other page, and that's why it doesn't
> > return the buffer to the free list. But in the case of
> > EvictUnpinnedBuffer() we are not using that buffer for any page, so it
> > must be returned to the free list. InvalidateBuffer() does that but
> > its prologue mentions that it's supposed to be used when freeing
> > buffers for relations and databases.
> >
> > I think there are two solutions
> > 1. Use InvalidBuffer() instead of InvalidateVictimBuffer(). But I am
> > not sure whether that's safe or what other safety measures we have to
> > put in EvictUnpinnedBuffer()
>
> I don't really think we can do this. InvalidateBuffer() waits forever
> to be able to put the buffer on the freelist. That's because it is
> only used when dropping a relation or database. So it can assume (as
> it says in the comments above WaitIO()) that the only reason the
> buffer will be pinned is if someone else is flushing out the page. It
> will always retry -- since the relation is being dropped, no one else
> could be trying to concurrently access it to read it. You can't make
> this assumption in EvictUnpinnedBuffer().

Thanks for the explanation. This option is ruled out then.

>
> > 2. Call StrategyFreeBuffer() after InvalidateVictimBuffer()
>
> I don't know exactly what would be required to make this work, but it
> seems reasonable to try. The only places StrategyFreeBuffer() is used
> is 1) InvalidateBuffer() and 2) when doing relation extension. In the
> first case, we know no one can know about the buffer because we waited
> until all pins were released and the buffer is part of a relation that
> is being dropped. In the second case, I think the buffers we add to
> the freelist are also ones that no one can know about yet because the
> extension hasn't completed. I'm fuzzy on the details here, so I would
> defer to Andres.
>
> Anyway, my gut feeling is that we have to do something to make calling
> StrategyFreeBuffer() safe to do in EvictUnpinnedBuffer(), but I don't
> know what it is.

I think we may enhance the pg_buffercache_evict() function to put it
back in the freelist; the behaviour being controlled by an argument
flag. I haven't explored the feasibility yet. That will leave
EvictUnpinnedBuffer() as is.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-02-12 06:16:35 EquivalenceClass and custom_read_write
Previous Message Peter Smith 2025-02-12 05:43:12 Re: Skip collecting decoded changes of already-aborted transactions