Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

From: Noah Misch <noah(at)leadboat(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, robertmhaas(at)gmail(dot)com, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date: 2024-01-08 18:21:25
Message-ID: 20240108182125.f8.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jan 08, 2024 at 12:02:01PM -0500, Peter Geoghegan wrote:
> On Sat, Jan 6, 2024 at 5:44 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Tied to that decision is the choice of semantics when the xmin horizon moves
> > backward during one VACUUM, e.g. when a new walsender xmin does so. Options:
> >
> > 1. Continue to remove tuples based on the OldestXmin from VACUUM's start. We
> > could have already removed some of those tuples, so the walsender xmin
> > won't achieve a guarantee anyway. (VACUUM would want ratchet-like behavior
> > in GlobalVisState, possibly by sharing OldestXmin with pruneheap like you
> > say.)
> >
> > 2. Move OldestXmin backward, to reflect the latest xmin horizon. (Perhaps
> > VACUUM would just pass GlobalVisState to a function that returns the
> > compatible OldestXmin.)
> >
> > Which way is better?
>
> I suppose that a hybrid of these two approaches makes the most sense.
> A design that's a lot closer to your #1 than to your #2.
>
> Under this scheme, pruneheap.c would be explicitly aware of
> OldestXmin, and would promise to respect the exact invariant that we
> need to avoid getting stuck in lazy_scan_prune's loop (or avoid
> confused NewRelfrozenXid tracking on HEAD, which no longer has this
> loop). But it'd be limited to that exact invariant; we'd still avoid
> unduly imposing any requirements on pruning-away deleted tuples whose
> xmax was >= OldestXmin. lazy_scan_prune/vacuumlazy.c shouldn't care if
> we prune away any "extra" heap tuples, just because we can (or just
> because it's convenient to the implementation). Andres has in the past
> placed a lot of emphasis on being able to update the
> GlobalVisState-wise bounds on the fly. Not sure that it's really that
> important that VACUUM does that, but there is no reason to not allow
> it. So we can keep that property (as well as the aforementioned
> high-level OldestXmin immutability property).
>
> More importantly (at least to me), this scheme allows vacuumlazy.c to
> continue to treat OldestXmin as an immutable cutoff for both pruning
> and freezing -- the high level design doesn't need any revisions. We
> already "freeze away" multixact member XIDs >= OldestXmin in certain
> rare cases (i.e. we remove lockers that are determined to no longer be
> running in FreezeMultiXactId's "second pass" slow path), so there is
> nothing fundamentally novel about the idea of removing some extra XIDs
> >= OldestXmin in passing, just because it happens to be convenient to
> some low-level piece of code that's external to vacuumlazy.c.
>
> What do you think of that general approach?

That all sounds good to me.

> I see no reason why it
> matters if OldestXmin goes backwards across two VACUUM operations, so
> I haven't tried to avoid that.

That may be fully okay, or we may want to clamp OldestXmin to be no older than
relfrozenxid. I don't feel great about the system moving relfrozenxid
backward unless it observed an older XID, and observing an older XID would be
a corruption signal. I don't have a specific way non-monotonic relfrozenxid
breaks things, though.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2024-01-08 18:36:13 Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Previous Message Peter Geoghegan 2024-01-08 17:02:01 Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()