query plan using partial index expects a much larger number of rows than is possible

From: "Olivier Poquet" <opoquet(at)plumdev(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: query plan using partial index expects a much larger number of rows than is possible
Date: 2020-10-28 22:46:12
Message-ID: 59eb4cf7-4c68-448d-a2d8-541464803353@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,
I have a large table (about 13 million rows) full of customer order information. Since most of that information is for orders that have already been fulfilled, I have a partial index to help quickly zero in on rows that have not been fulfilled. This works well, but I noticed today when joining with another large table using its primary key that even though the planner was using my partial index, it decided to do a merge join to the second large table instead of the nested loop I would have expected.

Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a simple query that does an index scan on my partial index:

=> \d orderitems_committed_unfulfilled
Index "public.orderitems_committed_unfulfilled"
Column | Type | Key? | Definition
--------+--------+------+------------
id | bigint | yes | id
btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > fulfilled)

=> explain (analyze, buffers) select oi.id from orderitems oi where LEAST(oi.committed, oi.quantity) > oi.fulfilled;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using orderitems_committed_unfulfilled on orderitems oi (cost=0.41..31688.23 rows=2861527 width=8) (actual time=0.039..2.092 rows=2274 loops=1)
Heap Fetches: 2493
Buffers: shared hit=1883
Planning Time: 0.110 ms
Execution Time: 2.255 ms
(5 rows)

So nice and quick, but the planner thought it would get back 2861527 rows instead of the 2274 I actually have. That explains why it thought it would make sense to do a merge join with my other large table instead of the nested loop over the 2k rows. I would have expected the planner to know that there's no way it'll get back 2 million rows though, given that:

=> select relname, relpages, reltuples from pg_class where relname = 'orderitems_committed_unfulfilled';
relname | relpages | reltuples
----------------------------------+----------+-----------
orderitems_committed_unfulfilled | 3051 | 2112
(1 row)

It knows there's only 2k-ish of them from the index. The 2 mil number is the same as what the planner expects if I disable using indexes and it does a seq scan, so I'm assuming it's just the guess from the column statistics and the planner is not using the size of the partial index.

I'm running:
=> select version();
version
-------------------------------------------------------------------------------------------
PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 0be2109a97, 64-bit
(1 row)

I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if it changes in a later version or if I should file an enhancement request? Or if it's not expected is there's something I'm doing wrong, or should file a bug?

Thanks for your time.

--
Olivier Poquet
opoquet(at)plumdev(dot)com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2020-10-28 23:30:18 Re: query plan using partial index expects a much larger number of rows than is possible
Previous Message Tom Lane 2020-10-28 16:55:26 Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join