Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Bowen Shi <zxwsbg12138(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date: 2024-05-22 16:57:26
Message-ID: CAAKRu_YZeaS92OY02xFpgsgyLpEZaQ4DA5-s9GX97pp208ex7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 20, 2024 at 4:48 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Mon, May 20, 2024 at 11:58:23AM -0400, Melanie Plageman wrote:
> > On Sat, May 18, 2024 at 6:23 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > Are there obstacles to fixing the hang by back-patching 1ccc1e05ae instead of
> > > this? We'll need to get confident about 1ccc1e05ae before v17, and that
> > > sounds potentially easier than getting confident about both 1ccc1e05ae and
> > > this other approach.
> >
> > I haven't tried back-patching 1ccc1e05ae yet, but I don't understand
> > why we would want to use stable back branches to get comfortable with
> > an approach committed to an unreleased version of Postgres.
>
> I wouldn't say we want to use stable back branches to get comfortable with an
> approach. I wanted to say that it's less work to be confident about one new
> way of doing things than two new ways of doing things.

I think I understand your point better now.

> > The small fix proposed in this thread is extremely minimal and
> > straightforward. It seems much less risky as a backpatch.
>
> Here's how I model the current and proposed code:
>
> 1. v15 VACUUM removes tuples that are HEAPTUPLE_DEAD according to VisTest.
> OldestXmin doesn't cause tuple removal, but there's a hang when OldestXmin
> rules DEAD after VisTest ruled RECENTLY_DEAD.
>
> 2. With 1ccc1e05ae, v17 VACUUM still removes tuples that are HEAPTUPLE_DEAD
> according to VisTest only. OldestXmin doesn't come into play.
>
> 3. fix_hang_15.patch would make v15 VACUUM remove tuples that are
> HEAPTUPLE_DEAD according to _either_ VisTest or OldestXmin.
>
> Since (3) is the only list entry that removes tuples that VisTest ruled
> RECENTLY_DEAD, I find it higher risk. For all three, the core task of
> certifying the behavior is confirming that its outcome is sound when VisTest
> and OldestXmin disagree. How do you model it?

Okay, I see your point. In 1 and 2, tuples that would have been
considered HEAPTUPLE_DEAD at the beginning of vacuum but are
considered HEAPTUPLE_RECENTLY_DEAD when pruning them are not removed.
In 3, tuples that would have been considered HEAPTUPLE_DEAD at the
beginning of vacuum are always removed, regardless of whether or not
they would be considered HEAPTUPLE_RECENTLY_DEAD when pruning them.

One option is to add the logic in fix_hang_15.patch to master as well
(always remove tuples older than OldestXmin). This addresses your
concern about gaining confidence in a single solution.

However, I can see how removing more tuples could be concerning. In
the case that the horizon moves backwards because of a standby
reconnecting, I think the worst case is that removing that tuple
causes a recovery conflict on the standby (depending on the value of
max_standby_streaming_delay et al).

I'll experiment with applying 1ccc1e05ae to 14 and 15 and see how it goes.

> > > Regarding getting confident about 1ccc1e05ae, I think I
> > > follow the upthread arguments that it does operate correctly. As a cross
> > > check, I looked at each mention of oldestxmin in vacuumlazy.c and vacuum.c.
> > > Does the following heap_vacuum_rel() comment need an update?
> > >
> > > /*
> > > * Get cutoffs that determine which deleted tuples are considered DEAD,
> > > * not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze. Then determine
>
> Since 1ccc1e05ae, the cutoffs (values acquired in vacuum_get_cutoffs()) don't
> decide DEAD.

Thinking about it more, this is misleading either way. The values
acquired in vacuum_get_cutoffs() don't determine which tuples are
removed because they are considered HEAPTUPLE_DEAD. If they are
considered HEAPTUPLE_RECENTLY_DEAD by VisTest and HEAPTUPLE_DEAD when
compared to OldestXmin, there will be a hang.

> > > * the extent of the blocks that we'll scan in lazy_scan_heap. It has to
>
> Still accurate.

I thought about this more, and I actually don't understand what "Then
determine the extent of the blocks that we'll scan in lazy_scan_heap"
means. I assume it refers to RelationGetNumberOfBlocks() which is
called after vacuum_get_cutoffs(). But, if so, I have no idea how that
relates to the following sentence which says that it "has to happen in
this order to ensure that the OldestXmin cutoff field works as an
upper bound"...

> > > * happen in this order to ensure that the OldestXmin cutoff field works
> > > * as an upper bound on the XIDs stored in the pages we'll actually scan
>
> That sentence's problem predates 1ccc1e05ae. (OldestXmin is a lower bound on
> XIDs in pages we don't scan, not an upper bound on XIDs in pages we scan. To
> be pedantic, snapshots don't affect this bound, so the oldest running XID is
> also a lower bound.)

Hmm. I'm not sure I agree with your interpretation here. Or perhaps I
don't understand it. We initialize NewRelfrozenXid with OldestXmin and
that is, in turn, used to initialize pagefrz.NoFreezePageRelfrozenXid.

HeapPageFreeze->NoFreezePageRelfrozenXid is updated with any older xid
found in an unfrozen tuple on the pages we _do_ scan. That makes
OldestXmin an upper bound on unfrozen xids on the pages we do scan.

I don't think we can say anything about the XIDs of tuples on pages we
don't scan. We skip pages if they are all visible. We don't know that
tuples on those pages are newer than OldestXmin. That's why we can't
update relfrozenxid when we skip pages that are not all frozen.

And even if we skip scanning pages because they are all frozen, their
xids will be older than OldestXmin, so it wouldn't be a lower bound on
those either.

I don't understand "snapshots don't affect this bound, so the oldest
running XID is also a lower bound".

> > > * (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
>
> Still accurate.

Agree

> > > *
> > > * Next acquire vistest, a related cutoff that's used in pruning. We
>
> Still accurate.

Agree.

> > > * expect vistest will always make heap_page_prune_and_freeze() remove any
> > > * deleted tuple whose xmax is < OldestXmin.
>
> Since 1ccc1e05ae, that sentence does not apply.

That's true. I wonder how we should modify it. Some options:

1) change it from "always" to "usually"
2) remove the sentence because it is not worth mentioning OldestXmin
in relation to removing tuples
3) explain the situation with GlobalVisTest moving the horizon
backwards to before OldestXmin
4) change it to say vistest will always cause pruning to remove
deleted tuples visible as deleted to everyone

I also think we should change this part:

* lazy_scan_prune must never
* become confused about whether a tuple should be frozen or removed. (In
* the future we might want to teach lazy_scan_prune to recompute vistest
* from time to time, to increase the number of dead tuples it can prune
* away.)

lazy_scan_prune() can't really become confused about whether a tuple
should be frozen or removed. It may choose to freeze a tuple that was
ruled RECENTLY_DEAD by vistest and not removed. But that is okay. I
don't think there is any confusion.

The second part I'm not sure about either. Recomputing vistest in
lazy_scan_prune() would not cause more tuples to be removed. Doing so
in heap_page_prune_and_freeze() for reasons other than the current
logic in GlobalVisTestShouldUpdate() would have this effect. But, I
can't tell if that is even what this comment is suggesting.

> (Long-term, I favor removal of the cutoffs->OldestXmin field. I'm not aware
> of any calculation for which it remains the optimal value.)

I'm trying to think if there is any complication with using
GlobalVisState for freezing given that it could be updated during the
course of vacuuming the relation. If we used GlobalVisState when
determining what to freeze and maybe_needed moves backwards, perhaps
we could end up incorrectly updating relfrozenxid?

- Melanie

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-05-22 18:11:31 BUG #18474: wal_writer_delay limited to arbitrary value of 10s
Previous Message Robert Haas 2024-05-22 16:47:37 Re: BUG #18362: unaccent rules and Old Greek text