From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond. |
Date: | 2020-05-27 13:58:04 |
Message-ID: | CAKU4AWrkL+1brQ1Sd-GotV=8=gYaCNARurPkJek76m9sPwJOGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat <
ashutosh(dot)bapat(at)2ndquadrant(dot)com> wrote:
>
>
> On Wed, 27 May 2020 at 04:43, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
>> You can use the attached sql to reproduce this issue, but I'm not sure
>> you can
>> get the above result at the first time that is because when optimizer
>> think the
>> 2 index scan have the same cost, it will choose the first one it found,
>> the order
>> depends on RelationGetIndexList. If so, you may try drop and create
>> j1_i_im5 index.
>>
>> The sense behind this patch is we still use the cost based optimizer,
>> just when we
>> we find out the 2 index scans have the same cost, we prefer to use the
>> index which
>> have more qual filter on Index Cond. This is implemented by adjust the
>> qual cost
>> on index filter slightly higher.
>>
>
> Thanks for the example and the explanation.
>
> The execution time difference in your example is pretty high to account
> for executing the filter on so many rows. My guess is this has to do with
> the heap access. For applying the filter the entire row needs to be fetched
> from the heap. So we should investigate this case from that angle. Another
> guess I have is the statistics is not correct and hence the cost is wrong.
>
>
I believe this is a statistics issue and then the cost is wrong. More
characters of this
issue are: 1). If a data is out of range in the old statistics,
optimizer will given an 1 row
assumption. 2). based on the 1 row assumption, for query
"col1=out_of_range_val AND
col2 = any_value" Index (col1, col2) and (col1, col3) will have exactly
same cost for current
cost model. 3). If the statistics was wrong, (col1, col3) maybe a very bad
plan as shown
above, but index (col1, col2) should always better/no worse than (col1,
col3) in any case.
4). To expand the rule, for query "col1 = out_of_range_val AND col2 =
any_value AND col3 = any_val",
index are (col1, col2, col_m) and (col1, col_m, col_n), the former index
will aways has better/no worse
than the later one. 5). an statistics issue like this is not uncommon,
for example
an log based application, creation_date is very easy to out of range in
statistics.
so we need to optimize the cost model for such case, the method is the
patch I mentioned above.
I can't have a solid data to prove oracle did something similar, but based
on the talk with my
customer, oracle is likely did something like this.
--
Best Regards
Andy Fan
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-05-27 13:59:17 | Re: password_encryption default |
Previous Message | Jonathan S. Katz | 2020-05-27 13:54:47 | Re: password_encryption default |