Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date: 2024-08-15 20:58:25
Message-ID: 79103c37-0d26-4250-a80c-b797a0d9f6f8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi! Thank you for your work on this subject!

On 15.08.2024 22:22, Peter Geoghegan wrote:
> Attached patch has EXPLAIN ANALYZE display the total number of
> primitive index scans for all 3 kinds of index scan node. This is
> useful for index scans that happen to use SAOP arrays. It also seems
> almost essential to offer this kind of instrumentation for the skip
> scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
> (see commit 5bf748b8) to skip over irrelevant sections of a composite
> index with a low cardinality leading column, so it has all the same
> issues.
I think that it is enough to pass the IndexScanDesc parameter to the
function - this saves us from having to define the planstate type twice.

For this reason, I suggest some changes that I think may improve your patch.

> One reason to have this patch is to differentiate between similar
> cases involving simple SAOP arrays. The user will have some reasonable
> way of determining how a query such as this:
>
> pg(at)regression:5432 [2070325]=# explain (analyze, buffers, costs off,
> summary off)
> select
> abalance
> from
> pgbench_accounts
> where
> aid in (1, 2, 3, 4, 5);
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
> time=0.007..0.008 rows=5 loops=1) │
> │ Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))
> │
> │ Primitive Index Scans: 1
> │
> │ Buffers: shared hit=4
> │
> └──────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (4 rows)
>
> ...differs from a similar query, such as this:
>
> pg(at)regression:5432 [2070325]=# explain (analyze, buffers, costs off,
> summary off)
> select
> abalance
> from
> pgbench_accounts
> where
> aid in (1000, 2000, 3000, 4000, 5000);
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
> time=0.006..0.012 rows=5 loops=1) │
> │ Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))
> │
> │ Primitive Index Scans: 5
> │
> │ Buffers: shared hit=20
> │
> └──────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (4 rows)
>
> Another reason to have this patch is consistency. We're only showing
> the user the number of times we've incremented
> pg_stat_user_tables.idx_scan in each case. The fact that
> pg_stat_user_tables.idx_scan counts primitive index scans like this is
> nothing new. That issue was only documented quite recently, as part of
> the Postgres 17 work, and it seems quite misleading. It's consistent,
> but not necessarily nonsurprising. Making it readily apparent that
> there is more than one primitive index scan involved here makes the
> issue less surprising.
>
> Skip scan
> ---------
>
> Here's an example with this EXPLAIN ANALYZE patch applied on top of my
> skip scan patch [1], using the tenk1 table left behind when the
> standard regression tests are run:
>
> pg(at)regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
> CREATE INDEX
> pg(at)regression:5432 [2070865]=# explain (analyze, buffers, costs off,
> summary off)
> select
> stringu1
> from
> tenk1
> where
> -- Omitted: the leading column on "four"
> stringu1 = 'YGAAAA';
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
> ├───────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
> time=0.011..0.017 rows=15 loops=1) │
> │ Index Cond: (stringu1 = 'YGAAAA'::name)
> │
> │ Heap Fetches: 0
> │
> │ Primitive Index Scans: 5
> │
> │ Buffers: shared hit=11
> │
> └───────────────────────────────────────────────────────────────────────────────────────────────────┘
> (5 rows)
>
> Notice that there are 5 primitive index scans here. That's what I'd
> expect, given that there are exactly 4 distinct "logical subindexes"
> implied by our use of a leading column on "four" as the scan's skip
> column. Under the hood, an initial primitive index scan locates the
> lowest "four" value. There are then 4 additional primitive index scans
> to locate the next "four" value (needed when the current "four" value
> gets past the value's "stringu1 = 'YGAAAA'" tuples).
>
> Obviously, the cardinality of the leading column predicts the number
> of primitive index scans at runtime. But it can be much more
> complicated of a relationship than what I've shown here may suggest.
> Skewness matters, too. Small clusters of index tuples with unique
> leading column values will greatly increase column
> cardinality/ndistinct, without a commensurate increase in the cost of
> a skip scan (that skips using that column). Those small clusters of
> unique values will appear on the same few leaf pages. It follows that
> they cannot substantially increase the number of primitive scans
> required at runtime -- they'll just be read all together at once.
>
> An important goal of my design for skip scan is that we avoid the need
> for special index paths within the optimizer. Whether or not we skip
> is always a runtime decision (when a skippable index attribute exists
> at all). The optimizer needs to know about skipping for costing
> purposes only -- all of the required optimizer changes are in
> selfuncs.c. That's why you didn't see some kind of special new index
> scan node here -- you just saw the number of primitive index scans.
>
> My motivation for working on this EXPLAIN ANALYZE patch is primarily
> skip scan. I don't think that it necessarily matters, though. I think
> that this patch can be treated as independent work. It would have been
> weird to not bring it up skip scan even once here, though.
>
> Presentation design choices
> ---------------------------
>
> I've used the term "primitive index scan" for this. That is the
> existing user-visible terminology [2], though I suppose that that
> could be revisited now.
>
> Another quasi-arbitrary design choice: I don't break out primitive
> index scans for scan nodes with multiple loops (e.g., the inner side
> of a nested loop join). The count of primitive scans accumulates
> across index_rescan calls. I did things this way because it felt
> slightly more logical to follow what we show for "Buffers" --
> primitive index scans are another physical cost. I'm certainly not
> opposed to doing that part differently. It doesn't have to be one or
> the other (could break it out both ways), if people think that the
> added verbosity is worth it.
>
> I think that we shouldn't be counting calls to _bt_first as a
> primitive index scan unless they either call _bt_search or
> _bt_endpoint to descend the index (in the case of nbtree scans). This
> means that cases where we detect a contradictory qual in
> _bt_preprocess_keys should count as having zero primitive index scans.
> That is technically an independent thing, though it seems far more
> logical to just do it that way.
>
> Actually, I think that there might be existing bugs on HEAD, with
> parallel index scan -- I think we might be overcounting. We're not
> properly accounting for the fact that parallel workers usually don't
> perform a primitive index scan when their backend calls into
> _bt_first. I wonder if I should address that separately, as a bug
> fix...
>
> [1]https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP%2BG4bw%40mail.gmail.com
> [2]https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW
> -- see "Note" box
> --
> Peter Geoghegan

To be honest, I don't quite understand how information in explain
analyze about the number of used primitive indexes
will help me improve my database system as a user. Perhaps I'm missing
something.

Maybe it can tell me which columns are best to create an index on or
something like that?

Could you explain it me, please?

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
diff.no-cfbot text/plain 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-15 21:01:59 Re: pgsql: Introduce hash_search_with_hash_value() function
Previous Message Doruk Yilmaz 2024-08-15 20:53:17 Re: [Patch] add new parameter to pg_replication_origin_session_setup