Re: Xmax precedes relation freeze threshold errors

From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Sergey Aleynikov" <sergey(dot)aleynikov(at)gmail(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Xmax precedes relation freeze threshold errors
Date: 2022-06-14 16:12:08
Message-ID: ce53f0a4-ff33-40a7-b8b1-80b524aee52d@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Jun 14, 2022, at 2:27 PM, Sergey Aleynikov wrote:
> /usr/lib/postgresql/14/bin/pg_amcheck -h /var/run/pgsql15/ -d poker -t
> public.user_tournament --heapallindexed
> heap table “poker.public.user_tournament”, block 30356, offset 125:
> xmax 634989520 precedes relation freeze threshold 12:634871433

Hmm, some wild guesses:

- There used to be a bug whereby VACUUM failed to truncate pages at the end of a relation when all their tuples were removed; in order for this to happen, you need some disaster to occur at the end of vacuum (such as a system crash at just that time, or a NFS failure). This would cause these tuples, which should have disappeared, to remain. This is hard to solve: you have to figure out which tuples were those that should have been removed, and delete them, while at the same time retaining any tuples that were added to those pages afterwards. This is not super common, but it's definitely a possibility.

- Maybe there's some bug in amcheck that causes it to report tuple with an old xmax but which in reality are frozen? I don't think this is very likely, but In order to discard this hypothesis, you'd have to show the output of `heap_page_items` from the pages in question, or at least give some thought to the bits in `t_infomask`.

- Maybe you promoted a standby in some wrong way. I don't know what this entails, but I've seen it claimed that failing to follow the documented procedures exactly, you might end up with broken data pages.

If these block numbers are, or were at some time in the past, near the end of the table, then the first possibility sounds the most likely of these three. However, if you have dozens of tables with the same problem, there might be something else going on.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Sergey Aleynikov 2022-06-14 17:43:21 Re: Xmax precedes relation freeze threshold errors
Previous Message Roberto Mireles 2022-06-14 13:53:57 Warm standby server slow to apply WAL (log-shipping)