From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: EvictUnpinnedBuffer and buffer free list |
Date: | 2025-02-12 05:47:47 |
Message-ID: | CAExHW5vR_m=KddwNyo-59vm_8E_kAZgKe19VyC-aHYK_mK8Dfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks a lot Melanie for a very detailed response, a good reference to pin.
On Fri, Jan 31, 2025 at 8:20 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> I don't have an explicit issue with EvictUnpinnedBuffer() putting
> buffers on the freelist -- it seems like that could be fine. But since
> it is for testing/development, I don't see what benefits it will have
> to users. It sounds like you saw issues when developing -- what kinds
> of issues?
I can't say it was issue, may be an expectation mismatch. In my
experiment where the entire buffer pool was full, I was expecting the
evicted buffer to be available immediately for the next page request.
I didn't expect another eviction. I kinda thought that the buffer was
lost, but it was returned. The next thing I tried was to evict many
buffers together using EvictUnpinnedBuffer() and those buffers took a
long time to return to the pool because clock sweep wasn't as fast as
the eviction. But that's not a regular scenario, so may be current
behaviour is okay to avoid the lock contention.
>
> > The prologue of function InvalidateVictimBuffer() says "/* Helper
> > routine for GetVictimBuffer() ". I believe that it's expected that the
> > buffer will be allocated to some other page, and that's why it doesn't
> > return the buffer to the free list. But in the case of
> > EvictUnpinnedBuffer() we are not using that buffer for any page, so it
> > must be returned to the free list. InvalidateBuffer() does that but
> > its prologue mentions that it's supposed to be used when freeing
> > buffers for relations and databases.
> >
> > I think there are two solutions
> > 1. Use InvalidBuffer() instead of InvalidateVictimBuffer(). But I am
> > not sure whether that's safe or what other safety measures we have to
> > put in EvictUnpinnedBuffer()
>
> I don't really think we can do this. InvalidateBuffer() waits forever
> to be able to put the buffer on the freelist. That's because it is
> only used when dropping a relation or database. So it can assume (as
> it says in the comments above WaitIO()) that the only reason the
> buffer will be pinned is if someone else is flushing out the page. It
> will always retry -- since the relation is being dropped, no one else
> could be trying to concurrently access it to read it. You can't make
> this assumption in EvictUnpinnedBuffer().
Thanks for the explanation. This option is ruled out then.
>
> > 2. Call StrategyFreeBuffer() after InvalidateVictimBuffer()
>
> I don't know exactly what would be required to make this work, but it
> seems reasonable to try. The only places StrategyFreeBuffer() is used
> is 1) InvalidateBuffer() and 2) when doing relation extension. In the
> first case, we know no one can know about the buffer because we waited
> until all pins were released and the buffer is part of a relation that
> is being dropped. In the second case, I think the buffers we add to
> the freelist are also ones that no one can know about yet because the
> extension hasn't completed. I'm fuzzy on the details here, so I would
> defer to Andres.
>
> Anyway, my gut feeling is that we have to do something to make calling
> StrategyFreeBuffer() safe to do in EvictUnpinnedBuffer(), but I don't
> know what it is.
I think we may enhance the pg_buffercache_evict() function to put it
back in the freelist; the behaviour being controlled by an argument
flag. I haven't explored the feasibility yet. That will leave
EvictUnpinnedBuffer() as is.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2025-02-12 06:16:35 | EquivalenceClass and custom_read_write |
Previous Message | Peter Smith | 2025-02-12 05:43:12 | Re: Skip collecting decoded changes of already-aborted transactions |