Less selective index chosen unexpectedly

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Less selective index chosen unexpectedly
Date: 2021-05-18 20:36:02
Message-ID: CAAaqYe-a_CP3sjtT9P1WEVPeaZfWnhW8uGA-NvQg1dcwToU4FQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Note: I've chosen to send this initially to -bugs because it seems to me to
be just over the fine line between "this could be enhanced" to "this is
always the wrong choice/unexpected behavior", but I'd like feedback on
that, and we can move the discussion to -hackers if people think that'd be
a better fit.

We stumbled across an interesting planner outcome in one of our production
environments on a fairly new database.

This database has a table that contains a date (not timestamp) column whose
value is effectively the current date on insert (it's not exactly this, but
sufficiently so for a test case). That column is indexed by itself. Other
sets of columns are also indexed (some partial).

As new rows are inserted, queries that filter on the current insert date
value (at least on a "small [over the ultimate lifetime of the time]" new
database) result in a query plan that uses the date index rather than the
more specific indexes.

Analyzing resolves the issue when it arises.

At this point in the story it seems like an open-and-shut "you just need to
analyze more often/change the auto analyze threshold" case, but I think
there's perhaps a deeper planner design question.

Specifically we have a table (simplified repro case):
create table items(d date, t text, fk integer);
create index on items(d);
create index on items(t, fk, d);

For a query like:
select * from items where d = '2021-05-18' and fk = 1 and t = 'type0' limit
1;

It's possible to get either an index scan on items_d_idx with a filter on
"fk" and "t" or an index scan on items_t_fk_d_idx without the need for a
filter. Even if both plans estimate a low cost and a single row, it seems
to be that the scan on the index containing more columns (and no filter)
ought to be pretty strongly preferred unless the cost or estimate rows is
dramatically higher. I assume (but haven't verified with a debugger) that
what's happening here is at least partially related to fuzzy cost
comparison on paths.

The attached test case demonstrates the full problem at least some of the
time--I've not been able to make it deterministic, but I'd say it shows the
wrong plan choice (resulting in ~5k unnecessarily processed and filtered
rows) roughly 2/3 of the time on my laptop against PG11 (what we're running
on in production). Against ~master I able to reproduce the wrong plan
choice, but not the large number of filtered rows until I modified the
second INSERT's case statement to use "n.i > 5000" instead of "n.i < 25000"
-- I assume this is due to some combination of index deduplication/suffix
truncation. Interestingly with those changes it seems to be more
deterministic against ~master than the original repro case against 11.

Thoughts?
James Coleman

Attachment Content-Type Size
index_planner_repro_master.sql application/octet-stream 1.2 KB
index_planner_repro_pg11.sql application/octet-stream 1.2 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-05-18 21:16:27 Re: Less selective index chosen unexpectedly
Previous Message Tom Lane 2021-05-18 19:45:04 Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries