Quick Links

Re: index prefetching

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: index prefetching
Date:	2025-04-22 21:16:49
Message-ID:	CAH2-WzkD5nv4wFhw=Nee-akmW5yRHVG5uc-EF5ean4B_oKt94w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 22, 2025 at 2:34 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Yeah, that makes sense, although I've been thinking about this a bit
> differently. I haven't been trying to establish a new "component" to
> manage prefetching. For me the question was what's the right layer, so
> that unnecessary details don't leak into AM and/or executor.

FWIW that basically seems equivalent to what I said. If there's any
difference at all between what each of us has said, then it's only a
difference in emphasis. The "index scan manager" doesn't just manage
prefetching -- it manages the whole index scan, including details that
were previously only supposed to be known inside index AMs. It can do
so while weighing all relevant factors -- regardless of whether
they're related to the index structure or the heap structure.

It would be possible to (say) do everything at the index AM level
instead. But then we'd be teaching index AMs about heap/table AM
related costs, which would be a bad design, primarily because it would
have to duplicate the same logic in every supported index AM. Better
to have one dedicated layer that has an abstract-ish understanding of
both index AM scan costs, and table AM scan costs. It needs to be
abstract, but not too abstract -- costs like "read one index leaf
page" generalize well across all index AMs. And costs like "read one
table AM page" should also generalize quite well, at least across
block-based table AMs.

You primarily care about "doing the layering right", while I primarily
care about "making sure that one layer can see all relevant costs".
ISTM that these are two sides of the same coin.

> It requires exchanging some additional details with the AM, provided by
> the new callbacks.

I think of it as primarily externalizing decisions about index page
accesses. The index AM reads the next leaf page to be read because the
index scan manager tells it to. The index AM performs killitems
exactly as instructed by the index scan manager. And the index AM
doesn't really own as much context about the progress of the scan --
that all lives inside the scan manager instead. The scan manager has a
fairly fuzzy idea about how the index AM organizes data, but that
shouldn't matter.

> It seems the indexam.c achieves both your and mine goals, more or less.

Agreed.

> Yes. I wonder if we should introduce a separate abstraction for this, as
> a subset of indexam.c.

I like that idea.

> My argument was (a) ability to disable prefetching, and fall back to the
> old code if needed, and (b) handling use cases where prefetching does
> not work / is not implemented, even if only temporarily (e.g. ordered
> scan in SP-GiST). Maybe (a) is unnecessarily defensive, and (b) may not
> be needed. Not sure.

We don't need to make a decision on this for some time, but I still
lean towards forcing index AMs to make a choice between this new
interface, and the old amgettuple interface.

> > I don't see why prefetching should be mandatory with this new
> > interface. Surely it has to have adaptive "ramp-up" behavior already,
> > even when we're pretty sure that prefetching is a good idea from the
> > start?
> >
>
> Possibly, I may be too defensive. And perhaps in cases where we know the
> prefetching can't help we could disable that for the read_stream.

Shouldn't the index scan manager be figuring all this out for us,
automatically? Maybe that works in a very trivial way, at first. The
important point is that the design be able to support these
requirements in some later iteration of the feature -- though it's
unlikely to happen in the first Postgres version that the scan manager
thing appears in.

> I think hash should be fairly easy to support. But I was really thinking
> about doing SP-GiST, exactly because it's very different in some
> aspects, and I wanted to validate the design on that (for hash I think
> it's almost certain it's OK).

WFM.

There are still bugs in SP-GiST (and GiST) index-only scans:

https://www.postgresql.org/message-id/CAH2-Wz%3DPqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt%3D5qxWQ@mail.gmail.com

It would be nice if the new index scan manager interface could fix
that bug, at least in the case of SP-GiST. By generalizing the
approach that nbtree takes, where we hang onto a leaf buffer pin.
Admittedly this would necessitate changes to SP-GiST VACUUM, which
doesn't cleanup lock any pages, but really has to in order to fix the
underlying bug. There are draft patches that try to fix the bug, which
might be a useful starting point.

> I think the cases affected by this the most are index-only scans on
> all-visible tables that fit into shared buffers, with
> correlated/sequential pattern. Or even regular index scans with all data
> in shred buffers.

My hope is that the index scan manager can be taught to back off when
this is happening, to avoid the regressions. Or that it can avoid them
by only gradually ramping up the prefetching. Does that sound
plausible to you?

--
Peter Geoghegan

In response to

Re: index prefetching at 2025-04-22 18:34:35 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jelte Fennema-Nio	2025-04-22 21:28:18	Re: What's our minimum supported Python version?
Previous Message	Christoph Berg	2025-04-22 21:03:29	Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.