Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2024-11-11 18:03:07
Message-ID: CAH2-WznFwgU3AddTqnvJABX5xo-9upG6NiX+2s0eaFhFj6tRAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 11, 2024 at 12:23 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I think that holding onto pins and whatnot has almost nothing to do
> > with the index AM as such -- it's about protecting against unsafe
> > concurrent TID recycling, which is a table AM/heap issue. You can make
> > a rather weak argument that the index AM needs it for _bt_killitems,
> > but that seems very secondary to me (if you go back long enough there
> > are no _bt_killitems, but the pin thing itself still existed).
>
> Much of this discussion is going over my head, but I have a comment on
> this part. I suppose that when any code in the system takes a pin on a
> buffer page, the initial concern is almost always to keep the page
> from disappearing out from under it.

That almost never comes up in index AM code, though -- cases where you
simply want to avoid having an index page evicted do exist, but are
naturally very rare. I think that nbtree only does this during page
deletion by VACUUM, since it works out to be slightly more convenient
to hold onto just the pin at one point where we quickly drop and
reacquire the lock. Index AMs find very little use for pins that don't
naturally coexist with buffer locks. And even the supposed exception
that happens for page deletion could easily be replaced by just
dropping the pin and the lock (there'd just be no point in it).

I almost think of "pin held" and "buffer lock held" as synonymous when
working on the nbtree code, even though you have this one obscure page
deletion case where that isn't quite true (plus the TID recycle safety
business imposed by heapam). As far as protecting the structure of the
index itself is concerned, holding on to buffer pins alone does not
matter at all.

I have a vague recollection of hash doing something novel with cleanup
locks, but I also seem to recall that that had problems -- I think
that we got rid of it not too long back. In any case my mental model
is that cleanup locks are for the benefit of heapam, never for the
benefit of index AMs themselves. This is why we require cleanup locks
for nbtree VACUUM but not nbtree page deletion, even though both
operations perform precisely the same kinds of page-level
modifications to the index leaf page.

> There might be a few exceptions,
> but hopefully not many. So I suppose what is happening here is that
> index AM pins an index page so that it can read that page -- and then
> it defers releasing the pin because of some interlocking concern. So
> at any given moment, there's some set of pins (possibly empty) that
> the index AM is holding for its own purposes, and some other set of
> pins (also possibly empty) that the index AM no longer requires for
> its own purposes but which are still required for heap/index
> interlocking.

That summary is correct, but FWIW I find the emphasis on index pins
slightly odd from an index AM point of view.

The nbtree code virtually always calls _bt_getbuf and _bt_relbuf, as
opposed to independently acquiring pins and locks -- that's why "lock"
and "pin" seem almost synonymous to me in nbtree contexts. Clearly no
index AM should hold onto a buffer lock for more than an instant, so
my natural instinct is to wonder why you're even talking about buffer
pins or buffer locks that the index AM cares about directly.

As I said to Tomas, yeah, the index AM kinda sometimes needs to hold
onto a leaf page pin to be able to correctly perform _bt_killitems.
But this is only because it needs to reason about concurrent TID
recycling. So this is also not really any kind of exception.
(_bt_killitems is even prepared to reason about cases where no pin was
held at all, and has been since commit 2ed5b87f96.)

> The second set of pins could possibly be managed in some
> AM-agnostic way. The AM could communicate that after the heap is done
> with X set of TIDs, it can unpin Y set of pages. But the first set of
> pins are of direct and immediate concern to the AM.
>
> Or at least, so it seems to me. Am I confused?

I think that this is exactly what I propose to do, said in a different
way. (Again, I wouldn't have expressed it in this way because it seems
obvious to me that buffer pins don't have nearly the same significance
to an index AM as they do to heapam -- they have no value in
protecting the index structure, or helping an index scan to reason
about concurrency that isn't due to a heapam issue.)

Does that make sense?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-11-11 18:03:36 Re: [PATCH] Add sortsupport for range types and btree_gist
Previous Message Masahiko Sawada 2024-11-11 18:00:06 Re: Skip collecting decoded changes of already-aborted transactions