Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>
Subject: Re: index prefetching
Date: 2023-06-09 00:40:15
Message-ID: CAH2-Wzk7T9K3d9_NY+jEXp2qQGMYoP=gZMoR8q1Cv57SxAw1OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 8, 2023 at 4:38 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> This is conceptually a "mini bitmap index scan", though one that takes
> place "inside" a plain index scan, as it processes one particular leaf
> page. That's the kind of design that "plain index scan vs bitmap index
> scan as a continuum" leads me to (a little like the continuum between
> nested loop joins, block nested loop joins, and merge joins). I bet it
> would be practical to do things this way, and help a lot with some
> kinds of queries. It might even be simpler than avoiding excessive
> prefetching using an LRU cache thing.

I'll now give a simpler (though less realistic) example of a case
where "mini bitmap index scan" would be expected to help index scans
in general, and prefetching during index scans in particular.
Something very simple:

create table bitmap_parity_test(randkey int4, filler text);
create index on bitmap_parity_test (randkey);
insert into bitmap_parity_test select (random()*1000),
repeat('filler',10) from generate_series(1,250) i;

This gives me a table with 4 pages, and an index with 2 pages.

The following query selects about half of the rows from the table:

select * from bitmap_parity_test where randkey < 500;

If I force the query to use a bitmap index scan, I see that the total
number of buffers hit is exactly as expected (according to
EXPLAIN(ANALYZE,BUFFERS), that is): there are 5 buffers/pages hit. We
need to access every single heap page once, and we need to access the
only leaf page in the index once.

I'm sure that you know where I'm going with this already. I'll force
the same query to use a plain index scan, and get a very different
result. Now EXPLAIN(ANALYZE,BUFFERS) shows that there are a total of
89 buffers hit -- 88 of which must just be the same 5 heap pages,
again and again. That's just silly. It's probably not all that much
slower, but it's not helping things. And it's likely that this effect
interferes with the prefetching in your patch.

Obviously you can come up with a variant of this test case where
bitmap index scan does way fewer buffer accesses in a way that really
makes sense -- that's not in question. This is a fairly selective
index scan, since it only touches one index page -- and yet we still
see this difference.

(Anybody pedantic enough to want to dispute whether or not this index
scan counts as "selective" should run "insert into bitmap_parity_test
select i, repeat('actshually',10) from generate_series(2000,1e5) i"
before running the "randkey < 500" query, which will make the index
much larger without changing any of the details of how the query pins
pages -- non-pedants should just skip that step.)

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-06-09 01:21:47 Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15
Previous Message Jaime Casanova 2023-06-09 00:37:13 Re: git.postgresql.org seems to be down