Re: Expanding HOT updates for expression and partial indexes

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: "Burd, Greg" <gregburd(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2025-02-10 23:20:41
Message-ID: Z6qJydc0BNF8AGPt@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 10, 2025 at 06:17:42PM +0100, Matthias van de Meent wrote:
> I have serious doubts about the viability of any proposal working to
> implement PHOT/WARM in PostgreSQL, as they seem to have an inherent
> nature of fundamentally breaking the TID lifecycle:
> We won't be able to clean up dead-to-everyone TIDs that were
> PHOT-updated, because some index Y may still rely on it, and we can't
> remove the TID from that same index Y because there is still a live
> PHOT/WARM tuple later in the chain whose values for that index haven't
> changed since that dead-to-everyone tuple, and thus this PHOT/WARM
> tuple is the one pointed to by that index.
> For HOT, this isn't much of an issue, because there is just one TID
> that's impacted (and it only occupies a single LP slot, with
> LP_REDIRECT). However, with PHOT/WARM, you'd relatively easily be able
> to fill a page with TIDs (or even full tuples) you can't clean up with
> VACUUM until the moment a the PHOT/WARM/HOT chain is broken (due to
> UPDATE leaving the page or the final entry getting DELETE-d).
>
> Unless we are somehow are able to replace the TIDs in indexes from
> "intermediate dead PHOT" to "base TID"/"latest TID" (either of which
> is probably also problematic for indexes that expect a TID to appear
> exactly once in the index at any point in time) I don't think the
> system is viable if we maintain only a single data structure to
> contain all dead TIDs. If we had a datastore for dead items per index,
> that'd be more likely to work, but it also would significantly
> increase the memory overhead of vacuuming tables.

I share your concerns, but I don't think things are as dire as you suggest.
For example, perhaps we put a limit on how long a PHOT chain can be, or
maybe we try to detect update patterns that don't work well with PHOT.
Another option could be to limit PHOT updates to only when the same set of
indexed columns are updated or when <50% of the indexed columns are
updated. These aren't fully fleshed-out ideas, of course, but I am at
least somewhat optimistic we could find appropriate trade-offs.

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-02-10 23:30:56 Re: RFC: Allow EXPLAIN to Output Page Fault Information
Previous Message Jacob Champion 2025-02-10 23:19:46 Re: dblink: Add SCRAM pass-through authentication