From: | James Coleman <jtc331(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Use of additional index columns in rows filtering |
Date: | 2023-06-21 12:45:19 |
Message-ID: | CAAaqYe_=eEPNArD-Vi9C3ugEkpnPvfWO5LV0NWEhW159d1WVUg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
I've cc'd Jeff Davis on this due to a conversation we had at PGCon
about applying filters on index tuples during index scans.
I've also cc'd Andres Freund because I think this relates to his
musing in [1] that:
> One thing I have been wondering around this is whether we should not have
> split the code for IOS and plain indexscans...
I think I remember Peter Geoghegan also wondering (I can't remember if
this was in conversation at PGCon about index skip scans or in a
hackers thread) about how we compose these various index scan
optimizations.
To be certain this is probably a thing to tackle as a follow-on to
this patch, but it does seem to me that what we are implicitly
realizing is that (unlike with bitmap scans, I think) it doesn't
really make a lot of conceptual sense to have index only scans be a
separate node from index scans. Instead it's likely better to consider
it an optimization to index scans that can dynamically kick in when
it's able to be of use. That would allow it to compose with e.g.
prefetching in the aforelinked thread. At the very least we would need
pragmatic (e.g., cost of dynamically applying optimizations) rather
than conceptual reasons to argue they should continue to be separate.
Apologies for that lengthy preamble; on to the patch under discussion:
On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> I took a stab at this and implemented the trick with the VM - during
> index scan, we also extract the filters that only need the indexed
> attributes (just like in IOS). And then, during the execution we:
>
> 1) scan the index using the scan keys (as before)
>
> 2) if the heap page is all-visible, we check the new filters that can
> be evaluated on the index tuple
>
> 3) fetch the heap tuple and evaluate the filters
Thanks for working on this; I'm excited about this class of work
(along with index prefetching and other ideas I think there's a lot of
potential for improving index scans).
> This is pretty much exactly the same thing we do for IOS, so I don't see
> why this would be incorrect while IOS is correct.
>
> This also adds "Index Filter" to explain output, to show which filters
> are executed on the index tuple (at the moment the filters are a subset
> of "Filter"), so if the index tuple matches we'll execute them again on
> the heap tuple. I guess that could be fixed by having two "filter"
> lists, depending on whether we were able to evaluate the index filters.
Given that we show index filters and heap filters separately it seems
like we might want to maintain separate instrumentation counts of how
many tuple were filtered by each set of filters.
> Most of the patch is pretty mechanical - particularly the planning part
> is about identifying filters that can be evaluated on the index tuple,
> and that code was mostly shamelessly copied from index-only scan.
>
> The matching of filters to index is done in check_index_filter(), and
> it's simpler than match_clause_to_indexcol() as it does not need to
> consider operators etc. (I think). But maybe it should be careful about
> other things, not sure.
This would end up requiring some refactoring of the existing index
matching code (or alternative caching on IndexOptInfo), but
match_filter_to_index() calling check_index_filter() results in
constructs a bitmapset of index columns for every possible filter
which seems wasteful (I recognize this is a bit of a proof-of-concept
level v1).
> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
> earlier, the idea is to check VM and evaluate the filters on the index
> tuple if possible, similar to index-only scans. Except that we then have
> to fetch the heap tuple. Unfortunately, this means the code can't use
> index_getnext_slot() anymore. Perhaps we should invent a new variant
> that'd allow evaluating the index filters in between.
It does seem there are some refactoring opportunities there.
> With the patch applied, the query plan changes from:
>
> ...
>
> to
>
> QUERY PLAN
> -------------------------------------------------------------------
> Limit (cost=0.42..3662.15 rows=1 width=12)
> (actual time=13.663..13.667 rows=0 loops=1)
> Buffers: shared hit=544
> -> Index Scan using t_a_include_b on t
> (cost=0.42..3662.15 rows=1 width=12)
> (actual time=13.659..13.660 rows=0 loops=1)
> Index Cond: (a > 1000000)
> Index Filter: (b = 4)
> Rows Removed by Index Recheck: 197780
> Filter: (b = 4)
> Buffers: shared hit=544
> Planning Time: 0.105 ms
> Execution Time: 13.690 ms
> (10 rows)
>
> ...
I did also confirm that this properly identifies cases Jeff had
mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10)
= 4))".
I noticed also you still had questions/TODOs about handling index
scans for join clauses.
Regards,
James Coleman
1: https://www.postgresql.org/message-id/20230609000600.syqy447e6metnvyj%40awork3.anarazel.de
From | Date | Subject | |
---|---|---|---|
Next Message | James Coleman | 2023-06-21 12:51:39 | Opportunistically pruning page before update |
Previous Message | Ranier Vilela | 2023-06-21 12:16:01 | Re: Making empty Bitmapsets always be NULL |