row estimate for partial index

From: Harmen <harmen(at)lijzij(dot)de>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: row estimate for partial index
Date: 2023-01-13 15:01:59
Message-ID: 20230113150159.GJ91568@arp.lijzij.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I'm trying to figure out why a certain query doesn't use a partial index.

The context:
The table has about 100M rows with "contacts". Every contact belongs to an
"org", and can be marked "deleted". Since we're generally only interested in
non-deleted contacts we have a (partial) index on the table:

"contact_organization_id" btree (org_id, id) WHERE deleted IS NULL

for queries such as: "select * from contacts where org_id=123 and deleted is null order by id"

Works well enough. However, we now have an org_id which has > 10% of the rows,
but only a handful rows where "deleted is null" matches (so the org has a lot
of "deleted" contacts). The planner doesn't like this and it falls back to a
full table scan for the above query.

I've added a dedicated index just for that org_id, to see if that helps:

"org123" btree (id) WHERE deleted IS NULL AND org_id = 123

The planner seems to use it now, however the row estimate is way off:

my_db=> explain SELECT c.id, COALESCE(c.first_name, '') FROM contacts AS c WHERE c.organization_id = 123 AND c.deleted IS NULL ORDER BY id ASC;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=1761204.72..1883324.98 rows=1046672 width=36)
Workers Planned: 2
-> Sort (cost=1760204.70..1761513.04 rows=523336 width=36)
Sort Key: id
-> Parallel Bitmap Heap Scan on contact c (cost=318.13..1696183.14 rows=523336 width=36)
Recheck Cond: ((deleted IS NULL) AND (organization_id = 8448))
-> Bitmap Index Scan on org123 (cost=0.00..4.13 rows=1256006 width=0)
(7 rows)

Time: 3.337 ms

That "1256006" estimate from the org123 index is wrong. There are 7 contacts in
that org:
-> Bitmap Index Scan on org8448 (cost=0.00..4.13 rows=1256006 width=0) (actual time=0.841..0.841 rows=7 loops=1)

Is this expected? Is there a way to improve this?

Thanks!
Harmen

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2023-01-13 16:12:30 Directly embedding a psql SET variable inside another string?
Previous Message Dimitrios Apostolou 2023-01-13 12:49:15 Re: Why is a hash join preferred when it does not fit in work_mem