From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Postgres picks suboptimal index after building of an extended statistics |
Date: | 2022-07-07 22:07:36 |
Message-ID: | 3119052.1657231656@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
> On 12/8/21 04:26, Tomas Vondra wrote:
>> I wonder if we should teach clauselist_selectivity about UNIQUE indexes,
>> and improve the cardinality estimates directly, not just costing for
>> index scans.
> I tried to implement this in different ways. But it causes additional
> overhead and code complexity - analyzing a list of indexes and match
> clauses of each index with input clauses in each selectivity estimation.
> I don't like that way and propose a new patch in attachment.
I looked at this briefly. I do not think that messing with
btcostestimate/genericcostestimate is the right response at all.
The problem can be demonstrated with no index whatever, as in the
attached shortened version of the original example. I get
QUERY PLAN
---------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=1 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)
before adding the extended stats, and
QUERY PLAN
----------------------------------------------------
Seq Scan on a (cost=0.00..46.02 rows=28 width=12)
Filter: ((x = 1) AND (y = 1) AND (z = 1))
(2 rows)
afterwards. So the extended stats have made the rowcount
estimate significantly worse, which seems like an indicator of a
bug somewhere in extended stats. The more so because I can crank
default_statistics_target all the way to 10000 without these
estimates changing. If we can't get a dead-on estimate for a
2001-row table at that stats level, we're doing something wrong,
surely?
Also, I found that if I ask only for ndistinct stats,
I still get rows=1. The fishiness seems to be directly
a problem with dependencies stats.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
badestimate.sql | text/plain | 550 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2022-07-07 22:20:15 | Re: explain analyze rows=%.0f |
Previous Message | Nathan Bossart | 2022-07-07 22:07:16 | Re: remove more archiving overhead |