Re: Use of additional index columns in rows filtering

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-08-09 16:05:42
Message-ID: 97e01eae-5e25-439a-b7a8-4e6ee9aa66a8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/8/23 23:03, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 1:49 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> So we expect 1250 rows. If that was accurate, the index scan would have
>> to do 1250 heap fetches. It's just luck the index scan doesn't need to
>> do that. I don't this there's a chance to improve this costing - if the
>> inputs are this off, it can't do anything.
>
> Well, that depends. If we can find a way to make the bitmap index scan
> capable of doing something like the same trick through other means, in
> some other patch, then this particular problem (involving a simple
> inequality) just goes away. There may be other cases that look a
> little similar, with a more complicated expression, where it just
> isn't reasonable to expect a bitmap index scan to compete. Ideally,
> bitmap index scans will only be at a huge disadvantage when it just
> makes sense, due to the particulars of the expression.
>
> I'm not trying to make this your problem. I'm just trying to establish
> the general nature of the problem.
>
>> Also, I think this is related to the earlier discussion about maybe
>> costing it according to the worst case - i.e. as if we still needed
>> fetch the same number of heap tuples as before. Which will inevitably
>> lead to similar issues, with worse plans looking cheaper.
>
> Not in those cases where it just doesn't come up, because we can
> totally avoid visibility checks. As I said, securing that guarantee
> has the potential to make the costing a lot more reliable/easier to
> implement.
>

But in the example you shared yesterday, the problem is not really about
visibility checks. In fact, the index scan costing completely ignores
the VM checks - it didn't matter before, and the patch did not change
this. It's about the number of rows the index scan is expected to
produce - and those will always do a random I/O, we can't skip those.

>> That is certainly true - I'm trying to keep the scope somewhat close to
>> the original goal. Obviously, there may be additional things the patch
>> really needs to consider, but I'm not sure this is one of those cases
>> (perhaps I just don't understand what the issue is - the example seems
>> like a run-of-the-mill case of poor estimate / costing).
>
> I'm not trying to impose any particular interpretation here. It's
> early in the cycle, and my questions are mostly exploratory. I'm still
> trying to develop my own understanding of the trade-offs in this area.
>

Understood. I think this whole discussion is about figuring out these
trade offs and also how to divide the various improvements into "minimum
viable" changes.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2023-08-09 16:27:37 Re: Using defines for protocol characters
Previous Message Tomas Vondra 2023-08-09 15:56:48 Re: pgsql: Ignore BRIN indexes when checking for HOT udpates