Re: index prefetching

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: index prefetching
Date: 2024-01-16 20:10:23
Message-ID: 2a4024ec-d12c-449c-97cd-a64e77dc7067@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 16/01/2024 6:25 pm, Tomas Vondra wrote:
> On 1/16/24 09:13, Konstantin Knizhnik wrote:
>> Hi,
>>
>> On 12/01/2024 6:42 pm, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Here's an improved version of this patch, finishing a lot of the stuff
>>> that I alluded to earlier - moving the code from indexam.c, renaming a
>>> bunch of stuff, etc. I've also squashed it into a single patch, to make
>>> it easier to review.
>> I am thinking about testing you patch with Neon (cloud Postgres). As far
>> as Neon seaprates compute and storage, prefetch is much more critical
>> for Neon
>> architecture than for vanilla Postgres.
>>
>> I have few complaints:
>>
>> 1. It disables prefetch for sequential access pattern (i.e. INDEX
>> MERGE), motivating it that in this case OS read-ahead will be more
>> efficient than prefetch. It may be true for normal storage devices, bit
>> not for Neon storage and may be also for Postgres on top of DFS (i.e.
>> Amazon RDS). I wonder if we can delegate decision whether to perform
>> prefetch in this case or not to some other level. I do not know
>> precisely where is should be handled. The best candidate IMHO is
>> storager manager. But it most likely requires extension of SMGR API. Not
>> sure if you want to do it... Straightforward solution is to move this
>> logic to some callback, which can be overwritten by user.
>>
> Interesting point. You're right these decisions (whether to prefetch
> particular patterns) are closely tied to the capabilities of the storage
> system. So it might make sense to maybe define it at that level.
>
> Not sure what exactly RDS does with the storage - my understanding is
> that it's mostly regular Postgres code, but managed by Amazon. So how
> would that modify the prefetching logic?

Amazon RDS is just vanilla Postgres with file system mounted on EBS
(Amazon  distributed file system).
EBS provides good throughput but larger latencies comparing with local SSDs.
I am not sure if read-ahead works for EBS.

> 4. I think that performing prefetch at executor level is really great
>> idea and so prefetch can be used by all indexes, including custom
>> indexes. But prefetch will be efficient only if index can provide fast
>> access to next TID (located at the same page). I am not sure that it is
>> true for all builtin indexes (GIN, GIST, BRIN,...) and especially for
>> custom AM. I wonder if we should extend AM API to make index make a
>> decision weather to perform prefetch of TIDs or not.
> I'm not against having a flag to enable/disable prefetching, but the
> question is whether doing prefetching for such indexes can be harmful.
> I'm not sure about that.

I tend to agree with you - it is hard to imagine index implementation
which doesn't win from prefetching heap pages.
May be only the filtering case you have mentioned. But it seems to me
that current B-Tree index scan (not IOS) implementation in Postgres
doesn't try to use index tuple to check extra condition - it will fetch
heap tuple in any case.

>> 5. Minor notice: there are few places where index_getnext_slot is called
>> with last NULL parameter (disabled prefetch) with the following comment
>> "XXX Would be nice to also benefit from prefetching here." But all this
>> places corresponds to "point loopkup", i.e. unique constraint check,
>> find replication tuple by index... Prefetch seems to be unlikely useful
>> here, unlkess there is index bloating and and we have to skip a lot of
>> tuples before locating right one. But should we try to optimize case of
>> bloated indexes?
>>
> Are you sure you're looking at the last patch version? Because the
> current patch does not have any new parameters in index_getnext_* and
> the comments were removed too (I suppose you're talking about
> execIndexing, execReplication and those places).
>
Sorry, I looked at v20240103-0001-prefetch-2023-12-09.patch , I didn't
noticed v20240112-0001-Prefetch-heap-pages-during-index-scans.patch

> regards
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-01-16 20:10:37 Re: UUID v7
Previous Message David G. Johnston 2024-01-16 19:55:23 New Window Function: ROW_NUMBER_DESC() OVER() ?