Quick Links

Re: Less selective index chosen unexpectedly

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	James Coleman <jtc331(at)gmail(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Less selective index chosen unexpectedly
Date:	2021-05-18 21:34:20
Message-ID:	3375038.1621373660@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

James Coleman <jtc331(at)gmail(dot)com> writes:
> Specifically we have a table (simplified repro case):
> create table items(d date, t text, fk integer);
> create index on items(d);
> create index on items(t, fk, d);

> For a query like:
> select * from items where d = '2021-05-18' and fk = 1 and t = 'type0' limit
> 1;

> It's possible to get either an index scan on items_d_idx with a filter on
> "fk" and "t" or an index scan on items_t_fk_d_idx without the need for a
> filter. Even if both plans estimate a low cost and a single row, it seems
> to be that the scan on the index containing more columns (and no filter)
> ought to be pretty strongly preferred unless the cost or estimate rows is
> dramatically higher.

Actually not. The multi-column index is going to be physically larger,
which means that the estimated cost to descend into it is going to be
larger just on the grounds of more I/O. The extra comparisons to
include the additional columns in that search aren't free either.
Since you've specified LIMIT 1, you've also given up much of any cost
advantage that might accrue to scanning items after the first match.
Hence, the only way that the multi-column index is going to win out is
if the extra filter conditions are estimated to be selective enough
(relative to the condition on "d") that we have to scan multiple
entries in the d-only index before getting the first match.

Experimenting by adding

explain select * from items where d = '2021-05-18' limit 1;

(to see what the estimated selectivity of just that condition is)
at each step of your script, I see that in the trouble cases,
the "d" condition is by itself estimated to match only one row.
If that were accurate, the planner would be quite right to pick
the smaller index.

The only thing I see that's really going wrong here is marginally
inaccurate stats, especially right after a big insertion that's
not reflected into the stats yet. I'm not sure there's much to
improve there. You could increase the stats target some more,
though of course that just pushes out the size of table where
the issue will appear.

regards, tom lane

In response to

Less selective index chosen unexpectedly at 2021-05-18 20:36:02 from James Coleman

Responses

Re: Less selective index chosen unexpectedly at 2021-05-18 21:50:48 from Alvaro Herrera
Re: Less selective index chosen unexpectedly at 2021-05-19 14:54:10 from James Coleman

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alvaro Herrera	2021-05-18 21:50:48	Re: Less selective index chosen unexpectedly
Previous Message	Peter Geoghegan	2021-05-18 21:16:27	Re: Less selective index chosen unexpectedly