Re: index prefetching

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2024-09-30 21:16:25
Message-ID: ff251778-d02b-42f7-b5f3-63af50e43a7a@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here's another version of this patch series, with a couple significant
improvements, mostly in the indexam.c and executor layers. The AM code
remains almost untouched.

I have focused on the simplification / cleanup of the executor code
(nodeIndexscan and nodeIndexonlyscan). In the previous version there was
quite a bit of duplicated code - both for the "regular" index scans and
index-only scans, the "while getnext" block was copied, calling either
the non-batched or batched functions.

That is now mostly gone. I managed to move 99% of the differences to the
indexam.c layer, so that the executor simply calls index_getnext_tid()
or index_getnext_slot(), and that decides *internally* whether to use
the batched version, or not. This means the only new function added to
the indexam API is index_batch_add(), which the index AMs use to add
items into the batch. For the executor the code remains the same.

The only exception is that index-only scans need a way to guide the
prefetching based on the visibility map (we don't want to prefetch
all-visible pages, because skipping those is the whole point of IOS).
And we also want a way to share the VM check, so that it doesn't need to
happen twice. Because for fully-cached workloads this is too expensive.

Doing the first part is trivial - we simply define a callback for the
batching, responsible for inspecting the VM and making a decision.
That's easy, and fairly clean. Passing the VM check result back is a bit
awkward, though. The current patch deals with it by just executing the
callback again (which just returns the cached result), or doing the VM
check locally (for non-batched version). It's not pretty, because it
leaks knowledge of the batching into the executor.

I'd appreciate ideas how to solve this in a nicer way.

I've also split the nbtree changes into a separate patch. It used to be
included in the first patch, but I've decided to keep it separate, just
like for the other AMs.

I'm now fairly happy with both the executor layer and the (much smaller)
indexam.c code, and I think it's in a good enough shape for a review.

The next item on my TODO is cleanup of the nbtree code, particularly the
mark/restore part in patch 0003. So I'll work on that next. I also plan
to get back to the index_batch_prefetch() code, which is not wrong but
would benefit from a bit of cleanup / clarification etc.

regards

--
Tomas Vondra

Attachment Content-Type Size
v20240930-0001-WIP-index-batching-prefetching.patch text/x-patch 59.5 KB
v20240930-0002-WIP-batching-for-nbtree-indexes.patch text/x-patch 15.0 KB
v20240930-0003-PoC-support-for-mark-restore-for-nbtree.patch text/x-patch 13.1 KB
v20240930-0004-WIP-batching-for-hash-indexes.patch text/x-patch 14.3 KB
v20240930-0005-WIP-batching-for-gist-indexes.patch text/x-patch 11.7 KB
v20240930-0006-WIP-batching-for-sp-gist-indexes.patch text/x-patch 7.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-09-30 21:21:30 Re: Changing the state of data checksums in a running cluster
Previous Message Alexey Orlov 2024-09-30 20:15:38 Patch: Show queries of processes holding a lock