Re: Unused expression indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Maciek Sakrejda <maciek(at)pganalyze(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Unused expression indexes
Date: 2024-08-06 20:25:12
Message-ID: 1827623.1722975912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Maciek Sakrejda <maciek(at)pganalyze(dot)com> writes:
> In a blog post [1], Bruce Momjian notes that expression indexes can
> help with planning even if they're not used directly. But the examples
> cited in that post are vague (i.e., they improve stats, but it's not
> clear how they could change plans), and Bruce's answer to a comment
> [2] suggests that this is not documented.

> Is there any more info on this mechanism? Specifically, if one has
> unused expression indexes (according to pg_stat_user_indexes), is it
> safe to drop them? Or could they be providing statistics that
> materially affect query planning even though the indexes themselves
> are unused?

Expression indexes definitely can affect planning, because ANALYZE
collects stats on the values of those expressions. As a trivial
example,

regression=# create table foo (x1 float8);
CREATE TABLE
regression=# insert into foo select 10 * random() from generate_series(1,10000);
INSERT 0 10000
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..195.00 rows=3333 width=8) (actual time=0.009..0.546 rows=1028 loops=1)
Filter: (sqrt(x1) < '1'::double precision)
Rows Removed by Filter: 8972
Planning Time: 0.065 ms
Execution Time: 0.572 ms
(5 rows)

The planner has no info about the values of sqrt(x1), so you get a
default estimate (one-third) of the selectivity of the WHERE clause.
But watch this:

regression=# create index on foo (sqrt(x1));
CREATE INDEX
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=24.24..84.63 rows=1026 width=8) (actual time=0.078..0.229 rows=1028 loops=1)
Recheck Cond: (sqrt(x1) < '1'::double precision)
Heap Blocks: exact=45
-> Bitmap Index Scan on foo_sqrt_idx (cost=0.00..23.98 rows=1026 width=0) (actual time=0.068..0.068 rows=1028 loops=1)
Index Cond: (sqrt(x1) < '1'::double precision)
Planning Time: 0.113 ms
Execution Time: 0.259 ms
(7 rows)

Now there are stats about the values of sqrt(x1), allowing a far more
accurate selectivity estimate to be made. In this particular example
there's no change of plan (it would have used the index anyway), but
certainly a different rowcount estimate can make a big difference.

This mechanism is quite ancient, and in principle it's now superseded
by extended statistics. For example, I can drop this index and
instead do

regression=# drop index foo_sqrt_idx;
DROP INDEX
regression=# create statistics foostats on sqrt(x1) from foo;
CREATE STATISTICS
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..195.00 rows=1026 width=8) (actual time=0.006..0.479 rows=1028 loops=1)
Filter: (sqrt(x1) < '1'::double precision)
Rows Removed by Filter: 8972
Planning Time: 0.079 ms
Execution Time: 0.503 ms
(5 rows)

So the accurate rowcount estimate is still obtained in this example;
and we're not incurring any index maintenance costs, only ANALYZE
costs that are going to be roughly the same either way.

However, I am not certain that extended statistics are plugged into
all the places where the older mechanism applies. Tomas Vondra might
have a better idea than I of where gaps remain in that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-06 20:41:45 Re: Minor refactorings to eliminate some static buffers
Previous Message Heikki Linnakangas 2024-08-06 20:12:12 Re: Minor refactorings to eliminate some static buffers