From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2024-03-01 15:18:54 |
Message-ID: | 48d3ff87-a435-488f-b803-258dab6485d6@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/15/24 21:30, Peter Geoghegan wrote:
> On Thu, Feb 15, 2024 at 3:13 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> This is why I don't think that the tuples with lower page offset
>>> numbers are in any way significant here. The significant part is
>>> whether or not you'll actually need to visit more than one leaf page
>>> in the first place (plus the penalty from not being able to reorder
>>> the work across page boundaries in your initial v1 of prefetching).
>>
>> To me this your phrasing just seems to reformulate the issue.
>
> What I said to Tomas seems very obvious to me. I think that there
> might have been some kind of miscommunication (not a real
> disagreement). I was just trying to work through that.
>
>> In practical terms you'll have to wait for the full IO latency when fetching
>> the table tuple corresponding to the first tid on a leaf page. Of course
>> that's also the moment you had to visit another leaf page. Whether the stall
>> is due to visit another leaf page or due to processing the first entry on such
>> a leaf page is a distinction without a difference.
>
> I don't think anybody said otherwise?
>
>>>> That's certainly true / helpful, and it makes the "first entry" issue
>>>> much less common. But the issue is still there. Of course, this says
>>>> nothing about the importance of the issue - the impact may easily be so
>>>> small it's not worth worrying about.
>>>
>>> Right. And I want to be clear: I'm really *not* sure how much it
>>> matters. I just doubt that it's worth worrying about in v1 -- time
>>> grows short. Although I agree that we should commit a v1 that leaves
>>> the door open to improving matters in this area in v2.
>>
>> I somewhat doubt that it's realistic to aim for 17 at this point.
>
> That's a fair point. Tomas?
>
I think that's a fair assessment.
To me it seems doing the prefetching solely at the executor level is not
really workable. And if it can be made to work, there's far too many
open questions to do that in the last commitfest.
I think the consensus is at least some of the logic/control needs to
move back to the index AM. Maybe there's some minimal part that we could
do for v17, even if it has various limitations, and then improve that in
v18. Say, doing the leaf-page-at-a-time and passing a little bit of
information from the index scan to drive this.
But I have very hard time figuring out what the MVP version should be,
because I have very limited understanding on how much control the index
AM ought to have :-( And it'd be a bit silly to do something in v17,
only to have to rip it out in v18 because it turned out to not get the
split right.
>> We seem to
>> still be doing fairly fundamental architectual work. I think it might be the
>> right thing even for 18 to go for the simpler only-a-single-leaf-page
>> approach though.
>
> I definitely think it's a good idea to have that as a fall back
> option. And to not commit ourselves to having something better than
> that for v1 (though we probably should commit to making that possible
> in v2).
>
Yeah, I agree with that.
>> I wonder if there are prerequisites that can be tackled for 17. One idea is to
>> work on infrastructure to provide executor nodes with information about the
>> number of tuples likely to be fetched - I suspect we'll trigger regressions
>> without that in place.
>
> I don't think that there'll be regressions if we just take the simpler
> only-a-single-leaf-page approach. At least it seems much less likely.
>
I'm sure we could pass additional information from the index scans to
improve that further. But I think the gradual ramp-up would deal with
most regressions. At least that's my experience from benchmarking the
early version.
The hard thing is what to do about cases where neither of this helps.
The example I keep thinking about is IOS - if we don't do prefetching,
it's not hard to construct cases where regular index scan gets much
faster than IOS (with many not-all-visible pages). But we can't just
prefetch all pages, because that'd hurt IOS cases with most pages fully
visible (when we don't need to actually access the heap).
I managed to deal with this in the executor-level version, but I'm not
sure how to do this if the control moves closer to the index AM.
>> One way to *sometimes* process more than a single leaf page, without having to
>> redesign kill_prior_tuple, would be to use the visibilitymap to check if the
>> target pages are all-visible. If all the table pages on a leaf page are
>> all-visible, we know that we don't need to kill index entries, and thus can
>> move on to the next leaf page
>
> It's possible that we'll need a variety of different strategies.
> nbtree already has two such strategies in _bt_killitems(), in a way.
> Though its "Modified while not pinned means hinting is not safe" path
> (LSN doesn't match canary value path) seems pretty naive. The
> prefetching stuff might present us with a good opportunity to replace
> that with something fundamentally better.
>
No opinion.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2024-03-01 15:33:05 | Re: POC, WIP: OR-clause support for indexes |
Previous Message | Tomas Vondra | 2024-03-01 14:58:38 | Re: index prefetching |