Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-06-21 00:02:16
Message-ID: CAH2-WzkZUDr8TQDPuax1SmJg9B5yz-Qhr7NdoQJD5PpXLAUA7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> If vacuum fails to remove a tuple with xmax older than
> VacuumCutoffs->OldestXmin and younger than
> GlobalVisState->maybe_needed, it will ERROR out when determining
> whether or not to freeze the tuple with "cannot freeze committed
> xmax".
>
> In back branches starting with 14, failing to remove tuples older than
> OldestXmin during pruning caused vacuum to infinitely loop in
> lazy_scan_prune(), as investigated on this [1] thread.

This is a great summary.

> We can fix this by always removing tuples considered dead before
> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby
> has a transaction that sees that tuple as alive, because it will
> simply wait to replay the removal until it would be correct to do so
> or recovery conflict handling will cancel the transaction that sees
> the tuple as alive and allow replay to continue.

I think that this is the right general approach.

> The repro forces a round of index vacuuming after the standby
> reconnects and before pruning a dead tuple whose xmax is older than
> OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.

Right. I saw details exactly consistent with this when I used GDB
against a production instance.

I'm glad that you were able to come up with a repro that involves
exactly the same basic elements, including index page deletion.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-06-21 00:22:36 Re: ON ERROR in json_query and the like
Previous Message Bruce Momjian 2024-06-21 00:01:19 PG 17 and GUC variables