From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2023-12-21 15:20:45 |
Message-ID: | afd17f3c-d496-5583-e93a-5a162d37235f@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/21/23 14:43, Andres Freund wrote:
> Hi,
>
> On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote:
>> You're right a lot of this is a guesswork. I don't think we can do much
>> better, because it depends on stuff that's out of our control - each OS
>> may do things differently, or perhaps it's just configured differently.
>>
>> But I don't think this is really a serious issue - all the read-ahead
>> implementations need to work about the same, because they are meant to
>> work in a transparent way.
>>
>> So it's about deciding at which point we think this is a sequential
>> pattern. Yes, the OS may use a slightly different threshold, but the
>> exact value does not really matter - in the worst case we prefetch a
>> couple more/fewer blocks.
>>
>> The OS read-ahead can't really prefetch anything except sequential
>> cases, so the whole question is "When does the access pattern get
>> sequential enough?". I don't think there's a perfect answer, and I don't
>> think we need a perfect one - we just need to be reasonably close.
>
> For the streaming read interface (initially backed by fadvise, to then be
> replaced by AIO) we found that it's clearly necessary to avoid fadvises in
> cases of actual sequential IO - the overhead otherwise leads to easily
> reproducible regressions. So I don't think we have much choice.
>
Yeah, the regression are pretty easy to demonstrate. In fact, I didn't
have such detection in the first patch, but after the first round of
benchmarks it became obvious it's needed.
>
>> Also, while I don't want to lazily dismiss valid cases that might be
>> affected by this, I think that sequential access for index paths is not
>> that common (with the exception of clustered indexes).
>
> I think sequential access is common in other cases as well. There's lots of
> indexes where heap tids are almost perfectly correlated with index entries,
> consider insert only insert-only tables and serial PKs or inserted_at
> timestamp columns. Even leaving those aside, for indexes with many entries
> for the same key, we sort by tid these days, which will also result in
> "runs" of sequential access.
>
True. I should have thought about those cases.
>
>> Obviously, the latter case has much more severe impact, but it depends
>> on the exact workload / access pattern etc. The only "perfect" solution
>> would be to actually check the page cache, but well - that seems to be
>> fairly expensive.
>
>> What I was envisioning was something self-tuning, based on the I/O we
>> may do later. If the prefetcher decides to prefetch something, but finds
>> it's already in cache, we'd increase the distance, to remember more
>> blocks. Likewise, if a block is not prefetched but then requires I/O
>> later, decrease the distance. That'd make it adaptive, but I don't think
>> we actually have the info about I/O.
>
> How would the prefetcher know that hte data wasn't in cache?
>
I don't think there's a good way to do that, unfortunately, or at least
I'm not aware of it. That's what I meant by "we don't have the info" at
the end. Which is why I haven't tried implementing it.
The only "solution" I could come up with was some sort of "timing" for
the I/O requests and deducing what was cached. Not great, of course.
>
>> Alternatively, I was thinking about moving the prefetches into a
>> separate worker process (or multiple workers), so we'd just queue the
>> request and all the overhead would be done by the worker. The main
>> problem is the overhead of calling posix_fadvise() for blocks that are
>> already in memory, and this would just move it to a separate backend. I
>> wonder if that might even make the custom cache unnecessary / optional.
>
> The AIO patchset provides this.
>
OK, I guess it's time for me to take a look at the patch again.
>
>> AFAICS this seems similar to some of the AIO patch, I wonder what that
>> plans to do. I need to check.
>
> Yes, most of this exists there. The difference that with the AIO you don't
> need to prefetch, as you can just initiate the IO for real, and wait for it to
> complete.
>
Right, although the line where things stop being "prefetch" and becomes
"async" seems a bit unclear to me / perhaps more a point of view.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Frédéric Yhuel | 2023-12-21 15:21:26 | Re: Set log_lock_waits=on by default |
Previous Message | Alexander Lakhin | 2023-12-21 15:00:00 | Re: trying again to get incremental backup |