Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-05 05:24:29
Message-ID: 87le6xb5j9.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


David Rowley <dgrowleyml(at)gmail(dot)com> writes:

> On Tue, 5 Mar 2024 at 00:37, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>
>> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
>> > I don't think it would be right to fudge the costs in any way, but I
>> > think the risk factor for IndexPaths could take into account the
>> > number of unmatched index clauses and increment the risk factor, or
>> > "certainty_factor" as it is currently in my brain-based design. That
>> > way add_path() would be more likely to prefer the index that matches
>> > the most conditions.
>>
>> This is somehow similar with my proposal at [1]? What do you think
>> about the treat 'col op const' as 'col op $1' for the marked column?
>> This could just resolve a subset of questions in your mind, but the
>> method looks have a solid reason.
>
> Do you mean this?

Yes, it is not cautious enough to say "similar" too quick.

After reading your opinion again, I think what you are trying to do is
adding one more dimension to Path compared with the existing cost and
pathkey information and it would take effects on add_path stage. That is
impressive, and I'm pretty willing to do more testing once the v1 is
done.

I just noted you have expressed your idea about my proposal 1,

> We should do anything like add column options in the meantime. Those
> are hard to remove once added.

I will try it very soon. and I'm a bit of upset no one care about my
proposal 2 which is the AI method, I see many companies want to
introduce AI to planner even I don't seen any impressive success, but
this user case looks like a candidate.

>> + /*
>> + * To make the planner more robust to handle some inaccurate statistics
>> + * issue, we will add a extra cost to qpquals so that the less qpquals
>> + * the lower cost it has.
>> + */
>> + cpu_run_cost += 0.01 * list_length(qpquals);
>
> I don't think it's a good idea to add cost penalties like you proposed
> there. This is what I meant by "I don't think it would be right to
> fudge the costs in any way".
>
> If you modify the costs to add some small penalty so that the planner
> is more likely to favour some other plan, what happens if we then
> decide the other plan has some issue and we want to penalise that for
> some other reason? Adding the 2nd penalty might result in the original
> plan choice again. Which one should be penalised more? I think the
> uncertainty needs to be tracked separately.
>
> Fudging the costs like this is also unlikely to play nicely with
> add_path's use of STD_FUZZ_FACTOR. There'd be an incentive to do
> things like total_cost *= STD_FUZZ_FACTOR; to ensure we get a large
> enough penalty.

I agree and I just misunderstood your proposal yesterday.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-03-05 05:30:25 Re: POC, WIP: OR-clause support for indexes
Previous Message Shlok Kyal 2024-03-05 05:17:14 Re: speed up a logical replica setup