From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | "Burd, Greg" <gregburd(at)amazon(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Expanding HOT updates for expression and partial indexes |
Date: | 2025-02-11 21:18:45 |
Message-ID: | CAEze2WgyAUtW64eP8uxZmHUy-RfBtuiKaUybvi=Jc3bwgg3g4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 10 Feb 2025 at 20:11, Burd, Greg <gregburd(at)amazon(dot)com> wrote:
> > On Feb 10, 2025, at 12:17 PM, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> >>
> >> I have a few concerns with the patch, things I’d greatly appreciate your thoughts on:
> >>
> >> First, I pass an EState along the update path to enable running the checks in heapam, this works but leaves me feeling as if I violated separation of concerns. If there is a better way to do this let me know or if you think the cost of creating one in the execIndexing.c ExecIndexesRequiringUpdates() is okay that’s another possibility.
> >
> > I think that doesn't have to be bad.
>
> Meaning that the approach I’ve taken is okay with you?
If you mean "passing EState down through the AM so we can check if we
really need HOT updates", then Yes, that's OK. I don't think the basic
idea (executing the projections to check for differences) is bad per
se, however I do think we may need more design work on the exact shape
of how ExecIndexesRequiringUpdates() receives its information (which
includes the EState).
E.g. we could pass an opaquely typed pointer that's passed from the
executor state through the table_update method into this
ExecIndexesRequiringUpdates(). That opaque struct would then contain
the important information for index update state checking, so that the
AM can't realistically break things without bypassing the separation
of concerns, and doesn't have to know about any executor nodes.
>>> Third, there is overhead to this patch, it is no longer a single simple bitmap test to choose HOT or not in heap_update().
>>
>> Why can't it mostly be that simple in simple cases?
>
> It can remain that simple in the cases you mention. In relcache the hot blocking attributes are nearly the same, only the summarizing attributes are removed. The first test then in heap_update() is for overlap with the modified set. When there is none, the update will proceed on the HOT path.
>
> The presence of a summarizing index is determined in ExecIndexesRequiringUpdates() in execIndexing.c, so a slightly longer code path but not much new overhead.
Yes, but that's only determined at an index-by-index level, rather
than determined all at once, and that's real bad when you have
hundreds of indexes to go through (however unlikely it might be, I've
heard of cases where there are 1000s of indexes on one table). So, I
prefer limiting any O(n_indexes) operations to only the most critical
cases.
> > I mean, it's clear that "updated indexed column's value == non-HOT
> > update". And that to determine whether an updated *projected* column's
> > value (i.e., expression index column's value) was actually updated we
> > need to calculate the previous and current index value, thus execute
> > the projection twice. But why would we have significant additional
> > overhead if there are no expression indexes, or when we can know by
> > bitmap overlap that the only interesting cases are summarizing
> > indexes?
>
> You’re right, there’s not a lot of new overhead in that case except what happens in ExecIndexesRequiringUpdates() to scan over the list of IndexInfo. It is really only when there are many expressions/predicates requiring examination that there is any significant cost to this approach AFAICT (but if you see something please point it out).
See the attached approach. Evaluation of the expressions only has to
happen if there are any HOT-blocking attributes which are exclusively
hot-blockingly indexed through expressions, so if the updated
attribute numbers are a subset of hotblockingexprattrs. (substitute
hotblocking with summarizing for the summarizing approach)
> > I would've implemented this with (1) two new bitmaps, one each for
> > normal and summarizing indexes, each containing which columns are
> > exclusively used in expression indexes (and which should thus be used
> > to trigger the (comparatively) expensive recalculation).
>
> That was one where I started, over time that became harder to work as the bitmaps contain the union of index attributes for the table not per-column.
I think it's fairly easy to create, though.
> Now there is one bitmap to cover the broadest case and then a function to find the modified set of indexes where each is examined against bitmaps that contain only attributes specific to the index in question. This helped in cases where there were both expression and non-expression indexes on the same attribute.
Fair, but do we care about one expression index on (attr1->>'data')'s
value *not* changing when an index on (attr1) exists and attr1 has
changed? That index on att1 would block HOT updates regardless of the
(lack of) changes to the (att1->>'data') index, so doing those
expensive calculations seems quite wasteful.
So, in my opinion, we should also keep track of those attributes only
included in expressions of indexes, and that's fairly easy: see
attached prototype.diff.txt (might need some work, the patch was
drafted on v16's codebase, but the idea is clear).
The resulting %exprattrs bitmap contains attributes that are used only
in expressions of those index types.
> > The "new" output of these expression
> > evaluations would be stored to be used later as index datums, reducing
> > the number of per-expression evaluations down to 2 at most, rather
> > than 2+1 when the index needs an insertion but the expression itself
> > wasn't updated.
>
> Not reforming the new index tuples is also an interesting optimization. I wonder how that can be passed from within heapam’s call into a function in execIndexing up into nodeModifiyTable and back down into execIndexing and on to the index access method? I’ll have to think about that, ideas welcome.
Note that index tuple forming happens only in the index AM, it's the
Datum construction (i.e. projection from attributes/tuple to indexed
value) that I'd like to deduplicate. Though looking at the current
code, I don't think it's reasonable to have that as a requirement for
this work. It'd be a nice-to-have for sure, but not as requirement.
> > Do you have any documentation on the approaches used, and the specific
> > differences between v3 and v4? I don't see much of that in your
> > initial mail, and the patches themselves also don't show much of that
> > in their details. I'd like at least some documentation of the new
> > behaviour in src/backend/access/heap/README.HOT at some point before
> > this got marked as RFC in the commitfest app, though preferably sooner
> > rather than later.
>
> Good point, I should have updated README.HOT with the initial patchset. I’ll jump on that and update ASAP.
Thanks in advance.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Attachment | Content-Type | Size |
---|---|---|
prototype.diff.txt | text/plain | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2025-02-11 21:23:37 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Tom Lane | 2025-02-11 21:18:37 | Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup |