Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-07 10:42:31
Message-ID: 87il1y9ut5.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:

> On 5/3/2024 19:56, Andy Fan wrote:
>> I think it is OK for a design review, for the implementaion side, the
>> known issue includes:
>> 1. Support grap such infromation from its parent for partitioned table
>> if the child doesn't have such information.
>> 2. builtin document and testing.
>> Any feedback is welcome.
> Thanks for your efforts.
> I was confused when you showed the problem connected to clauses like
> "Var op Const" and "Var op Param".

hmm, then what is the soluation in your mind when you say the "ticky" in
[1]? I am thinking we have some communication gap here.

> As far as I know, the estimation logic of such clauses uses MCV and
> number-distinct statistics. So, being out of MCV values, it becomes
> totally insensitive to any internal skew in data and any data outside
> the statistics boundaries.
> Having studied the example you provided with the patch, I think it is
> not a correct example:
> Difference between var_eq_const and var_eq_non_const quite obvious:

The response should be same as what I did in [2], let's see if we can
make the gap between us smaller.

> In the second routine, you don't have information about the const value
> and can't use MCV for estimation. Also, you can't exclude MCV values
> from the estimation. And it is just luck that you've got the right
> answer. I think if you increased the weight of the unknown part, you
> would get a bad result, too.

> I would like to ask David why the var_eq_const estimator doesn't have an
> option for estimation with a histogram. Having that would relieve a
> problem with skewed data. Detecting the situation with incoming const
> that is out of the covered area would allow us to fall back to ndistinct
> estimation or something else. At least, histogram usage can be
> restricted by the reltuples value and ratio between the total number of
> MCV values and the total number of distinct values in the table.

I think an example which show your algorithm is better would be pretty
helpful for communication.

[1] https://www.postgresql.org/message-id/15381eea-cbc3-4087-9d90-ab752292bd54%40postgrespro.ru
[2] https://www.postgresql.org/message-id/87msra9vgo.fsf%40163.com
--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-03-07 11:02:27 Re: remaining sql/json patches
Previous Message David Rowley 2024-03-07 10:32:26 Re: a wrong index choose when statistics is out of date