From: | Harmen <harmen(at)lijzij(dot)de> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | row estimate for partial index |
Date: | 2023-01-13 15:01:59 |
Message-ID: | 20230113150159.GJ91568@arp.lijzij.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I'm trying to figure out why a certain query doesn't use a partial index.
The context:
The table has about 100M rows with "contacts". Every contact belongs to an
"org", and can be marked "deleted". Since we're generally only interested in
non-deleted contacts we have a (partial) index on the table:
"contact_organization_id" btree (org_id, id) WHERE deleted IS NULL
for queries such as: "select * from contacts where org_id=123 and deleted is null order by id"
Works well enough. However, we now have an org_id which has > 10% of the rows,
but only a handful rows where "deleted is null" matches (so the org has a lot
of "deleted" contacts). The planner doesn't like this and it falls back to a
full table scan for the above query.
I've added a dedicated index just for that org_id, to see if that helps:
"org123" btree (id) WHERE deleted IS NULL AND org_id = 123
The planner seems to use it now, however the row estimate is way off:
my_db=> explain SELECT c.id, COALESCE(c.first_name, '') FROM contacts AS c WHERE c.organization_id = 123 AND c.deleted IS NULL ORDER BY id ASC;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=1761204.72..1883324.98 rows=1046672 width=36)
Workers Planned: 2
-> Sort (cost=1760204.70..1761513.04 rows=523336 width=36)
Sort Key: id
-> Parallel Bitmap Heap Scan on contact c (cost=318.13..1696183.14 rows=523336 width=36)
Recheck Cond: ((deleted IS NULL) AND (organization_id = 8448))
-> Bitmap Index Scan on org123 (cost=0.00..4.13 rows=1256006 width=0)
(7 rows)
Time: 3.337 ms
That "1256006" estimate from the org123 index is wrong. There are 7 contacts in
that org:
-> Bitmap Index Scan on org8448 (cost=0.00..4.13 rows=1256006 width=0) (actual time=0.841..0.841 rows=7 loops=1)
Is this expected? Is there a way to improve this?
Thanks!
Harmen
From | Date | Subject | |
---|---|---|---|
Next Message | Ron | 2023-01-13 16:12:30 | Directly embedding a psql SET variable inside another string? |
Previous Message | Dimitrios Apostolou | 2023-01-13 12:49:15 | Re: Why is a hash join preferred when it does not fit in work_mem |