From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com> |
Subject: | Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |
Date: | 2024-06-21 00:02:16 |
Message-ID: | CAH2-WzkZUDr8TQDPuax1SmJg9B5yz-Qhr7NdoQJD5PpXLAUA7Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> If vacuum fails to remove a tuple with xmax older than
> VacuumCutoffs->OldestXmin and younger than
> GlobalVisState->maybe_needed, it will ERROR out when determining
> whether or not to freeze the tuple with "cannot freeze committed
> xmax".
>
> In back branches starting with 14, failing to remove tuples older than
> OldestXmin during pruning caused vacuum to infinitely loop in
> lazy_scan_prune(), as investigated on this [1] thread.
This is a great summary.
> We can fix this by always removing tuples considered dead before
> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby
> has a transaction that sees that tuple as alive, because it will
> simply wait to replay the removal until it would be correct to do so
> or recovery conflict handling will cancel the transaction that sees
> the tuple as alive and allow replay to continue.
I think that this is the right general approach.
> The repro forces a round of index vacuuming after the standby
> reconnects and before pruning a dead tuple whose xmax is older than
> OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.
Right. I saw details exactly consistent with this when I used GDB
against a production instance.
I'm glad that you were able to come up with a repro that involves
exactly the same basic elements, including index page deletion.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2024-06-21 00:22:36 | Re: ON ERROR in json_query and the like |
Previous Message | Bruce Momjian | 2024-06-21 00:01:19 | PG 17 and GUC variables |