Getting rid of freezing and hint bits by eagerly vacuuming aborted xacts (was: decoupling table and index vacuum)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Getting rid of freezing and hint bits by eagerly vacuuming aborted xacts (was: decoupling table and index vacuum)
Date: 2021-04-23 00:39:54
Message-ID: CAH2-Wz=YdZWZPXM6PN8CrLMcbrn+UVq_xS1o3XoTh9rhiKMfXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 22, 2021 at 3:52 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Apr 22, 2021 at 11:16 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > My most ambitious goal is finding a way to remove the need to freeze
> > > or to set hint bits. I think that we can do this by inventing a new
> > > kind of VACUUM just for aborted transactions, which doesn't do index
> > > vacuuming. You'd need something like an ARIES-style dirty page table
> > > to make this cheap -- so it's a little like UNDO, but not very much.
> >
> > I don't see how that works. An aborted transaction can have made index
> > entries, and those index entries can have already been moved by page
> > splits, and there can be arbitrarily many of them, so that you can't
> > keep track of them all in RAM. Also, you can crash after making the
> > index entries and writing them to the disk and before the abort
> > happens. Anyway, this is probably a topic for a separate thread.
>
> This is a topic for a separate thread, but I will briefly address your question.
>
> Under the scheme I've sketched, we never do index vacuuming when
> invoking an autovacuum worker (or something like it) to clean-up after
> an aborted transaction. We track the pages that all transactions have
> modified. If a transaction commits then we quickly discard the
> relevant dirty page table metadata. If a transaction aborts
> (presumably a much rarer event), then we launch an autovacuum worker
> that visits precisely those heap blocks that were modified by the
> aborted transaction, and just prune each page, one by one. We have a
> cutoff that works a little like relfrozenxid, except that it tracks
> the point in the XID space before which we know any XIDs (any XIDs
> that we can read from extant tuple headers) must be committed.
>
> The idea of a "Dirty page table" is standard ARIES. It'd be tricky to
> get it working, but still quite possible.
>
> The overall goal of this design is for the system to be able to reason
> about committed-ness inexpensively (to obviate the need for hint bits
> and per-tuple freezing). We want to be able to say for sure that
> almost all heap blocks in the database only contain heap tuples whose
> headers contain only committed XIDs, or LP_DEAD items that are simply
> dead (the exact provenance of these LP_DEAD items is not a concern,
> just like today). The XID cutoff for committed-ness could be kept
> quite recent due to the fact that aborted transactions are naturally
> rare. And because we can do relatively little work to "logically roll
> back" aborted transactions.
>
> Note that a heap tuple whose xmin and xmax are committed might also be
> dead under this scheme, since of course it might have been updated or
> deleted by an xact that committed. We've effectively decoupled things
> by making aborted transactions special, and subject to very eager
> cleanup.
>
> I'm sure that there are significant challenges with making something
> like this work. But to me this design seems roughly the right
> combination of radical and conservative.

I'll start a new thread now, as a placeholder for further discussion.

This would be an incredibly ambitious project, and I'm sure that this
thread will be very hand-wavy at first. But you've got to start
somewhere.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-04-23 00:43:09 Re: INT64_FORMAT in translatable strings
Previous Message houzj.fnst@fujitsu.com 2021-04-23 00:38:52 RE: Parallel INSERT SELECT take 2