Re: Expanding HOT updates for expression and partial indexes

From: "Burd, Greg" <gregburd(at)amazon(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2025-02-10 19:11:13
Message-ID: DBA06D41-32C8-45D5-A3FA-15A412692351@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Feb 10, 2025, at 12:17 PM, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
>
>>
>> I have a few concerns with the patch, things I’d greatly appreciate your thoughts on:
>>
>> First, I pass an EState along the update path to enable running the checks in heapam, this works but leaves me feeling as if I violated separation of concerns. If there is a better way to do this let me know or if you think the cost of creating one in the execIndexing.c ExecIndexesRequiringUpdates() is okay that’s another possibility.
>
> I think that doesn't have to be bad.

Meaning that the approach I’ve taken is okay with you?

>> Third, there is overhead to this patch, it is no longer a single simple bitmap test to choose HOT or not in heap_update().
>
> Why can't it mostly be that simple in simple cases?

It can remain that simple in the cases you mention. In relcache the hot blocking attributes are nearly the same, only the summarizing attributes are removed. The first test then in heap_update() is for overlap with the modified set. When there is none, the update will proceed on the HOT path.

The presence of a summarizing index is determined in ExecIndexesRequiringUpdates() in execIndexing.c, so a slightly longer code path but not much new overhead.

> I mean, it's clear that "updated indexed column's value == non-HOT
> update". And that to determine whether an updated *projected* column's
> value (i.e., expression index column's value) was actually updated we
> need to calculate the previous and current index value, thus execute
> the projection twice. But why would we have significant additional
> overhead if there are no expression indexes, or when we can know by
> bitmap overlap that the only interesting cases are summarizing
> indexes?

You’re right, there’s not a lot of new overhead in that case except what happens in ExecIndexesRequiringUpdates() to scan over the list of IndexInfo. It is really only when there are many expressions/predicates requiring examination that there is any significant cost to this approach AFAICT (but if you see something please point it out).

> I would've implemented this with (1) two new bitmaps, one each for
> normal and summarizing indexes, each containing which columns are
> exclusively used in expression indexes (and which should thus be used
> to trigger the (comparatively) expensive recalculation).

That was one where I started, over time that became harder to work as the bitmaps contain the union of index attributes for the table not per-column. Now there is one bitmap to cover the broadest case and then a function to find the modified set of indexes where each is examined against bitmaps that contain only attributes specific to the index in question. This helped in cases where there were both expression and non-expression indexes on the same attribute.

> Then, I'd maintain a (cached) list of unique projections/expressions
> found in indexes, so that 30 indexes on e.g.
> ((mycolumn::jsonb)->>'metadata') only extend to 1 check for
> differences, rather than 30.

An optimization to avoid rechecking isn’t a bad idea. I wonder how hard it would be to surface the field (->>’metadata’) from the index expression to track for redundancy, I’ll have to look into that.

> The "new" output of these expression
> evaluations would be stored to be used later as index datums, reducing
> the number of per-expression evaluations down to 2 at most, rather
> than 2+1 when the index needs an insertion but the expression itself
> wasn't updated.

Not reforming the new index tuples is also an interesting optimization. I wonder how that can be passed from within heapam’s call into a function in execIndexing up into nodeModifiyTable and back down into execIndexing and on to the index access method? I’ll have to think about that, ideas welcome.

> So, it'd be something like (pseudocode):
>
> if (bms_overlap(updated_columns, hotblocking))
> /* if columns only indexed through expressions were updated, do
> expensive stuff. Otherwise, it's a normal non-HOT update. */
> if (bms_subset_compare(updated_columns, hot_expression_columns) in
> (BMS_EQUAL, BMS_SUBSET1))
> expensive check for expression changes + populate index column data
> else
> normal_update
> else if (bms_overlap(updated_columns, summarizing))
> /* same as above for hotblocking, but now summarizing */
> if (bms_subset_compare(updated_columns, sum_expression_columns) in
> (BMS_EQUAL, BMS_SUBSET1))
> expensive check for summarized expression changes + populate
> summarized index column data
> else
> summarizing_update
> else
> hot_update
>
> Note that it is relatively expensive to do check whether any one index
> needs to be updated. It's generally cheaper to do all those checks at
> once, where possible; using one or 2 more bitmaps would be sufficient.
>
> Also note that this approach doesn't update specific summarizing
> indexes, just all of them or none. I think that "update only
> summarizing indexes that were updated" should be a separate patch from
> "check if indexed expressions' values changed", potentially in the
> patchset, but not as part of the main bulk.
>
>> Fourth, I’d like to know which version the community prefers (v3 or v4). I think v4 moves the code in a direction that is cleaner overall, but you may disagree. I realize that the way I use the modified_indexes bitmapset is a tad overloaded (NULL means all indexes should be updated, otherwise only update the indexes in the set which may be all/some/none of the indexes) and that may violate the principal of least surprise but I feel that it is better than the TU_UpdateIndexes enum in the code today.
>
> I would be hesitant to let table AMs decide which indexes to update at
> that precision. Note that this API would allow the AM to update only
> (say) the PK index and no other indexes, which is not allowed to
> happen if index consistentcy is required (which it is).

Interesting, thanks for the feedback. I’ll think on this a bit more and provide more detail with the next update.

> ----->8-----
>
> Do you have any documentation on the approaches used, and the specific
> differences between v3 and v4? I don't see much of that in your
> initial mail, and the patches themselves also don't show much of that
> in their details. I'd like at least some documentation of the new
> behaviour in src/backend/access/heap/README.HOT at some point before
> this got marked as RFC in the commitfest app, though preferably sooner
> rather than later.

Good point, I should have updated README.HOT with the initial patchset. I’ll jump on that and update ASAP.

> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)

thanks for the thoughtful reply.

-greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-10 19:30:15 Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Previous Message Melih Mutlu 2025-02-10 19:09:42 Re: speedup COPY TO for partitioned table.