Re: index prefetching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: index prefetching
Date: 2024-01-21 23:39:14
Message-ID: 91f02f2e-a6b5-4dda-8e45-5dd576ac3416@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/21/24 20:50, Konstantin Knizhnik wrote:
>
> On 20/01/2024 12:14 am, Tomas Vondra wrote:
>> Looks like I was not true, even if it is not index-only scan but index
>>> condition involves only index attributes, then heap is not accessed
>>> until we find tuple satisfying search condition.
>>> Inclusive index case described above
>>> (https://commitfest.postgresql.org/46/4352/) is interesting but IMHO
>>> exotic case. If keys are actually used in search, then why not to create
>>> normal compound index instead?
>>>
>> Not sure I follow ...
>>
>> Firstly, I'm not convinced the example addressed by that other patch is
>> that exotic. IMHO it's quite possible it's actually quite common, but
>> the users do no realize the possible gains.
>>
>> Also, there are reasons to not want very wide indexes - it has overhead
>> associated with maintenance, disk space, etc. I think it's perfectly
>> rational to design indexes in a way eliminates most heap fetches
>> necessary to evaluate conditions, but does not guarantee IOS (so the
>> last heap fetch is still needed).
>
> We are comparing compound index (a,b) and covering (inclusive) index (a)
> include (b)
> This indexes have exactly the same width and size and almost the same
> maintenance overhead.
>
> First index has more expensive comparison function (involving two
> columns)  but I do not think that it can significantly affect
> performance and maintenance cost. Also if selectivity of "a" is good
> enough, then there is no need to compare "b"
>
> Why we can prefer covering index  to compound index? I see only two good
> reasons:
> 1. Extra columns type do not  have comparison function need for AM.
> 2. The extra columns are never used in query predicate.
>

Or maybe you don't want to include the columns in a UNIQUE constraint?

> If you are going to use this columns in query predicates I do not see
> much sense in creating inclusive index rather than compound index.
> Do you?
>

But this is also about conditions that can't be translated into index
scan keys. Consider this:

create table t (a int, b int, c int);
insert into t select 1000 * random(), 1000 * random(), 1000 * random()
from generate_series(1,1000000) s(i);
create index on t (a,b);
vacuum analyze t;

explain (analyze, buffers) select * from t where a = 10 and mod(b,10) =
1111111;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Index Scan using t_a_b_idx on t (cost=0.42..3670.74 rows=5 width=12)
(actual time=4.562..4.564 rows=0 loops=1)
Index Cond: (a = 10)
Filter: (mod(b, 10) = 1111111)
Rows Removed by Filter: 974
Buffers: shared hit=980
Prefetches: blocks=901
Planning Time: 0.304 ms
Execution Time: 5.146 ms
(8 rows)

Notice that this still fetched ~1000 buffers in order to evaluate the
filter on "b", because it's complex and can't be transformed into a nice
scan key. Or this:

explain (analyze, buffers) select a from t where a = 10 and (b+1) < 100
and c < 0;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using t_a_b_idx on t (cost=0.42..3673.22 rows=1 width=4)
(actual time=4.446..4.448 rows=0 loops=1)
Index Cond: (a = 10)
Filter: ((c < 0) AND ((b + 1) < 100))
Rows Removed by Filter: 974
Buffers: shared hit=980
Prefetches: blocks=901
Planning Time: 0.313 ms
Execution Time: 4.878 ms
(8 rows)

where it's "broken" by the extra unindexed column.

FWIW there are the primary cases I had in mind for this patch.

>
>> What do you mean by "create normal compound index"? The patch addresses
>> a limitation that not every condition can be translated into a proper
>> scan key. Even if we improve this, there will always be such conditions.
>> The the IOS can evaluate them on index tuple, the regular index scan
>> can't do that (currently).
>>
>> Can you share an example demonstrating the alternative approach?
>
> May be I missed something.
>
> This is the example from
> https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=(at)yamlcoder(dot)me :
>
> ```
>
> And here is the plan with index on (a,b).
>
> Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884
> rows=0 loops=1)    Output: a, b, d    Buffers: shared hit=613    ->
> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1
> width=12) (actual time=6.880..6.881 rows=0 loops=1)          Output: a,
> b, d          Index Cond: ((t.a > 1000000) AND (t.b = 4))      
>    Buffers: shared hit=613 Planning:    Buffers: shared hit=41 Planning
> Time: 0.314 ms Execution Time: 6.910 ms ```
>
>
> Isn't it an optimal plan for this query?
>
> And cite from self reproducible example https://dbfiddle.uk/iehtq44L :
> ```
> create unique index t_a_include_b on t(a) include (b);
> -- I'd expecd index above to behave the same as index below for this query
> --create unique index on t(a,b);
> ```
>
> I agree that it is natural to expect the same result for both indexes.
> So this PR definitely makes sense.
> My point is only that compound index (a,b) in this case is more natural
> and preferable.
>

Yes, perhaps. But you may also see it from the other direction - if you
already have an index with included columns (for whatever reason), it
would be nice to leverage that if possible. And as I mentioned above,
it's not always the case that move a column from "included" to a proper
key, or stuff like that.

Anyway, it seems entirely unrelated to this prefetching thread.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-01-21 23:47:27 Re: index prefetching
Previous Message Peter Smith 2024-01-21 23:38:08 Re: pg_rewind WAL segments deletion pitfall