Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date: 2025-02-17 17:56:12
Message-ID: CAH2-WzmebSkeKPGw7TEaNw9=Qx-X8fAnFw916Fd2V8VVqYqqaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 27, 2024 at 8:36 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > I think we should divide them because by dividing the total buffer usage by the number of loops, user finds the average buffer consumption per loop. This gives them a clearer picture of the resource intensity per basic unit of work.
>
> I disagree; I think the whole "dividing by number of loops and
> rounding up to integer" was the wrong choice for tuple count, as that
> makes it difficult if not impossible to determine the actual produced
> count when it's less than the number of loops. Data is lost in the
> rounding/processing, and I don't want to have lost that data.

I think that you're definitely right about this. I changed my mind (or
changed it back to my original position) recently, when I noticed how
bad the problem was with parallel index scans: nloops generally comes
from the number of workers (including the leader) for parallel scans,
and so it wasn't that hard to see "Index Searches: 0" with the latest
version (the version that started to divide by nloops). Obviously,
that behavior is completely ridiculous. Let's not do that.

The precedent to follow here is "Heap Fetches: N" (in the context of
index-only scans), which also doesn't divide by nloops. Likely because
the same sorts of issues arise with heap fetches.

> Same applies for ~scans~ searches: If we do an index search, we should
> show it in the count as total sum, not partial processed value. If a
> user is interested in per-loopcount values, then they can derive that
> value from the data they're presented with; but that isn't true when
> we present only the divided-and-rounded value.

I recently came across a good example of how showing "Index Searches:
N" is likely to be useful in the context of nested loop joins. The
example comes from the recently committed ORs-to-SAOP join
transformation work (commit 627d6341).

If I run the test case (taken from src/test/regress/sql/join.sql) with
EXPLAIN ANALYZE, the output confirms that the optimization added by
that commit works particularly well:

pg(at)regression:5432 [205457]=# explain (analyze, costs off, SUMMARY off)
select count(*)
from tenk1 t1, tenk1 t2
where t2.thousand = t1.tenthous or t2.thousand = t1.unique1 or
t2.thousand = t1.unique2;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (actual time=13.761..13.762 rows=1 loops=1)

│ Buffers: shared hit=24201

│ -> Nested Loop (actual time=0.011..12.928 rows=20000 loops=1)

│ Buffers: shared hit=24201

│ -> Seq Scan on tenk1 t1 (actual time=0.004..0.507
rows=10000 loops=1) │
│ Buffers: shared hit=345

│ -> Index Only Scan using tenk1_thous_tenthous on tenk1 t2
(actual time=0.001..0.001 rows=2 loops=10000) │
│ Index Cond: (thousand = ANY (ARRAY[t1.tenthous,
t1.unique1, t1.unique2])) │
│ Index Searches: 11885

│ Heap Fetches: 0

│ Buffers: shared hit=23856

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

As you can see, there are 10,000 executions of the inner index-only
scan here, which has a SAOP qual whose array will always have 3 array
elements. That means that the best possible case is 10,000 index
searches, and the worst possible case is 30,000 index searches. We
actually see "Index Searches: 11885" -- not bad!

The main factor that gets us pretty close to that best possible case
is a certain kind of redundancy: many individual inner index scans
have duplicate array elements, allowing nbtree preprocessing to shrink
the array when as it is sorted and deduplicated -- the array used
during many individual inner scan executions has as few as one or two
array elements. Another contributing factor is the prevalence of "out
of bounds" array elements: many individual SAOP arrays/inner scans
have 2 array elements that are both greater than 1,000. That'll allow
nbtree to get away with needing only one index search for all
out-of-bounds array elements. That is, it allows nbtree to determine
that all out-of-bounds elements can't possibly have any matches using
only one index search (a search that lands on the rightmost leaf page,
where no matches for any out-of-bounds element will be found).

Of course, this can only be surmised from the EXPLAIN ANALYZE output
shown because I went back to not dividing by nloops within explain.c.
A huge amount of useful information would be lost in cases like this
if we divide by nloops. So, again, let's not do it that way.

It'd be just as easy to surmise what's going on here if the inner
index scan happened to be a plain index scan. That would make the
"Buffers" output include heap buffer hits, which would usually make it
impossible to infer how many of the "Buffers hit" came from the index
structure. My analysis didn't rely on "Buffers" at all, though (only
on "Index Searches: 11885" + "loops=10000"), so everything I pointed
out would be just as readily apparent.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Vatsa 2025-02-17 18:01:46 Clarification on Role Access Rights to Table Indexes
Previous Message Nathan Bossart 2025-02-17 17:53:00 Re: revamp row-security tracking