Re: a wrong index choose when statistics is out of date

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-04 05:33:06
Message-ID: CAApHDvq9QWL6az9veW1-HaW4LwGenW0V8YyKXFLZYoaF_4AQ+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 3 Mar 2024 at 20:08, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> The issue can be reproduced with the following steps:
>
> create table x_events (.., created_at timestamp, a int, b int);
>
> create index idx_1 on t(created_at, a);
> create index idx_2 on t(created_at, b);
>
> query:
> select * from t where create_at = current_timestamp and b = 1;
>
> index (created_at, a) rather than (created_at, b) may be chosen for the
> above query if the statistics think "create_at = current_timestamp" has
> no rows, then both index are OK, actually it is true just because
> statistics is out of date.

I don't think there's really anything too special about the fact that
the created_at column is always increasing. We commonly get 1-row
estimates after multiplying the selectivities from individual stats.
Your example just seems like yet another reason that this could
happen.

I've been periodically talking about introducing "risk" as a factor
that the planner should consider. I did provide some detail in [1]
about the design that was in my head at that time. I'd not previously
thought that it could also solve this problem, but after reading your
email, I think it can.

I don't think it would be right to fudge the costs in any way, but I
think the risk factor for IndexPaths could take into account the
number of unmatched index clauses and increment the risk factor, or
"certainty_factor" as it is currently in my brain-based design. That
way add_path() would be more likely to prefer the index that matches
the most conditions.

The exact maths to calculate the certainty_factor for this case I
don't quite have worked out yet. I plan to work on documenting the
design of this and try and get a prototype patch out sometime during
this coming southern hemisphere winter so that there's at least a full
cycle of feedback opportunity before the PG18 freeze.

We should do anything like add column options in the meantime. Those
are hard to remove once added.

David

[1] https://www.postgresql.org/message-id/CAApHDvo2sMPF9m%3Di%2BYPPUssfTV1GB%3DZ8nMVa%2B9Uq4RZJ8sULeQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-04 05:33:52 Re: Some shared memory chunks are allocated even if related processes won't start
Previous Message Amul Sul 2024-03-04 05:32:50 Re: Add system identifier to backup manifest