Re: pg_amcheck contrib application

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-16 18:40:20
Message-ID: CA+TgmoYKDt_f04gkJpD5Ghi0xz5ukNKktLya5qjz3=iVxLhXUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 2:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm circling back around to the idea that amcheck is trying to
> validate TOAST references that are already dead, and it's getting
> burnt because something-or-other has already removed the toast
> rows, though not the referencing datums. That's legal behavior
> once the rows are marked dead. Upthread it was claimed that
> amcheck isn't doing that, but this looks like a smoking gun to me.

I think this theory has some legs. From check_tuple_header_and_visibilty():

else if (!(infomask & HEAP_XMAX_COMMITTED))
return true; /*
HEAPTUPLE_DELETE_IN_PROGRESS or
*
HEAPTUPLE_LIVE */
else
return false; /*
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
}
return true; /* not dead */
}

That first case looks wrong to me. Don't we need to call
get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
and the xmax is marked committed, we consider the tuple checkable. The
comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
xmax is all-visible. And in the second case the comment says it's
either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
status.

Another thought here is that it's probably not wicked smart to be
relying on the hint bits to match the actual status of the tuple in
cases of corruption. Maybe we should be warning about tuples that are
have xmin or xmax flagged as committed or invalid when in fact clog
disagrees. That's not a particularly uncommon case, and it's hard to
check.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-16 18:45:13 Re: pg_amcheck contrib application
Previous Message Peter Geoghegan 2021-03-16 18:31:34 Re: Postgres crashes at memcopy() after upgrade to PG 13.