From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: a wrong index choose when statistics is out of date |
Date: | 2024-03-04 05:33:06 |
Message-ID: | CAApHDvq9QWL6az9veW1-HaW4LwGenW0V8YyKXFLZYoaF_4AQ+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 3 Mar 2024 at 20:08, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> The issue can be reproduced with the following steps:
>
> create table x_events (.., created_at timestamp, a int, b int);
>
> create index idx_1 on t(created_at, a);
> create index idx_2 on t(created_at, b);
>
> query:
> select * from t where create_at = current_timestamp and b = 1;
>
> index (created_at, a) rather than (created_at, b) may be chosen for the
> above query if the statistics think "create_at = current_timestamp" has
> no rows, then both index are OK, actually it is true just because
> statistics is out of date.
I don't think there's really anything too special about the fact that
the created_at column is always increasing. We commonly get 1-row
estimates after multiplying the selectivities from individual stats.
Your example just seems like yet another reason that this could
happen.
I've been periodically talking about introducing "risk" as a factor
that the planner should consider. I did provide some detail in [1]
about the design that was in my head at that time. I'd not previously
thought that it could also solve this problem, but after reading your
email, I think it can.
I don't think it would be right to fudge the costs in any way, but I
think the risk factor for IndexPaths could take into account the
number of unmatched index clauses and increment the risk factor, or
"certainty_factor" as it is currently in my brain-based design. That
way add_path() would be more likely to prefer the index that matches
the most conditions.
The exact maths to calculate the certainty_factor for this case I
don't quite have worked out yet. I plan to work on documenting the
design of this and try and get a prototype patch out sometime during
this coming southern hemisphere winter so that there's at least a full
cycle of feedback opportunity before the PG18 freeze.
We should do anything like add column options in the meantime. Those
are hard to remove once added.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-03-04 05:33:52 | Re: Some shared memory chunks are allocated even if related processes won't start |
Previous Message | Amul Sul | 2024-03-04 05:32:50 | Re: Add system identifier to backup manifest |