Quick Links

Re: Use of additional index columns in rows filtering

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject:	Re: Use of additional index columns in rows filtering
Date:	2023-08-04 23:34:04
Message-ID:	1bd92817-2c6e-5d94-f619-114f9d07ec53@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 8/4/23 04:07, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> Because my patch is all about reducing the heap pages, which are usually
>> the expensive part of the index scan. But you're right the "index scan"
>> with index filter may access more index pages, because it has fewer
>> "access predicates".
>
> It's not so much the unnecessary index page accesses that bother me.
> At least I didn't push that aspect very far when I constructed my
> adversarial test case -- index pages were only a small part of the
> overall problem. (I mean the problem at runtime, in the executor. The
> planner expected to save a small number of leaf page accesses, which
> was kinda, sorta the problem there -- though the planner might have
> technically still been correct about that, and can't have been too far
> wrong in any case.)
>

Thanks for the clarification, I think I understand better now.

Let me briefly sum my understanding for the two patches:

- The SAOP patch eliminates those heap accesses because it manages to
evaluate all clauses in the AM, including clauses that would previously
be treated as "table filters" and evaluated on the heap row.

- My patch achieves a similar result by evaluating the clauses as index
filters (i.e. on the index tuple). That's probably not as good as proper
access predicates, so it can't help with the index page accesses, but
better than what we had before.

There's a couple more related thoughts later in my reply.

> The real problem that my adversarial case seemed to highlight was a
> problem of extra heap page accesses. The tenk1 table's VM is less than
> one page in size, so how could it have been VM buffer hits? Sure,
> there were no "table filters" involved -- only "index filters". But
> even "index filters" require heap access when the page isn't marked
> all-visible in the VM.
>

No, the extra accesses were not because of VM buffer hits - it was
because of having to actually fetch the heap tuple for pages that are
not fully visible, which is what happens right after the insert.

The patch does what we index-only scans do - before evaluating the
filters on an index tuple, we check if the page is fully visible. If
not, we fetch the heap tuple and evaluate the filters on it.

This means even an index-only scan would behave like this too. And it
goes away as the table gets vacuumed, at which point we can eliminate
the rows using only the index tuple again.

> That problem just cannot happen with a similar plan that eliminates
> the same index tuples within the index AM proper (the index quals
> don't even have to be "access predicates" for this to apply, either).
> Of course, we never need to check the visibility of index tuples just
> to be able to consider eliminating them via nbtree search scan
> keys/index quals -- and so there is never any question of heap/VM
> access for tuples that don't pass index quals. Not so for "index
> filters", where there is at least some chance of accessing the heap
> proper just to be able to eliminate non-matches.
>

Right. This however begs a question - why would we actually need to
check the visibility map before evaluating the index filter, when the
index tuple alone is clearly good enough for the bitmapOr plan?

Because if we didn't need to do that VM check, this issue with extra
heap accesses would disappear.

I copied this from the IOS somewhat blindly, but now I'm starting to
think it was misguided. I thought it's a protection against processing
"invalid" tuples - not tuples broken after a crash (as that would be
fixed by crash recovery), but e.g. tuples with schema changes from an
aborted transaction.

But can this actually happen for indexes? For heap it's certainly
possible (BEGIN - ALTER - INSERT - ROLLBACK will leave behind tuples
like that). But we don't support changing indexes like this, right?

And if we had this issue, how come the bitmapOr plan (which ultimately
does the same thing, although entirely in the AM) does not need to do
these VM checks / heap accesses too? It's evaluating essentially the
same conditions on the index tuple ...

So I'm starting to think this just my misunderstanding of why IOS does
this VM check - it's purely to determine visibility of the result. When
it sees a pointer to page that is not all-visible, it decides it'll need
to do check visibility on a heap tuple anyway, and just fetches the heap
tuple right away. Which however ignores that the filters may eliminate
many of those tuples, so IOS could also make such unnecessary heap
accesses. It might be better to check the filters first, and only then
maybe fetch the heap tuple ...

> While I think that it makes sense to assume that "index filters" are
> strictly better than "table filters" (assuming they're directly
> equivalent in that they contain the same clause), they're not
> *reliably* any better. So "index filters" are far from being anywhere
> near as good as an equivalent index qual (AFAICT we should always
> assume that that's true). This is true of index quals generally --
> this advantage is *not* limited to "access predicate index quals". (It
> is most definitely not okay for "index filters" to displace equivalent
> "access predicate index quals", but it's also not really okay to allow
> them to displace equivalent "index filter predicate index quals" --
> the latter case is less bad, but AFAICT they both basically aren't
> acceptable "substitutions".)
>

I'm not quite sure what are the differences between "index qual" vs.
"access predicate index qual" vs. "index filter predicate index quals",
or what "dispacing" would mean exactly. But I agree there's a hierarchy
of qual types, and some "promotions" are likely guaranteed to be good.

FWIW this also reminds me that this whole discussion mostly focused on
SAOP clauses (and how they may be translated into access predicates
etc.). My patch is however way more general - it applies to all clauses,
not just SAOP ones, including clauses with no chance of evaluating at
the AM level.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Use of additional index columns in rows filtering at 2023-08-04 02:07:29 from Peter Geoghegan

Responses

Re: Use of additional index columns in rows filtering at 2023-08-05 00:53:39 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2023-08-05 00:53:39	Re: Use of additional index columns in rows filtering
Previous Message	Tom Lane	2023-08-04 23:14:25	Re: Release 17 of the PostgreSQL Buildfarm Client