Re: amcheck hardening

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: amcheck hardening
Date: 2021-03-13 19:06:35
Message-ID: CAH2-WzkSBZNSHsmLnnbr6gVE-u8TnGM7PRi0QyW9DkuKAbCEng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 13, 2021 at 10:35 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> The testing strategy I'm using is to corrupt heap and btree pages in schema "public" of the "regression" database created by `make installcheck`, by overwriting random bytes in randomly selected locations on pages after the page header. Then I run `pg_amcheck regression` and see if anything segfaults. Doing this repeatedly, with random bytes and locations within files not the same from one run to the next, I can find the locations of segfaults that are particularly common.

That seems like a reasonable starting point to me.

> The first common problem [1] happens at verify_nbtree.c:1422 when a corruption report is being generated. The generation does not seem entirely safe, and the problematic bit can be avoided

> The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.

I think that the best way to further harden verify_nbtree.c at this
point is to do the most basic validation of IndexTuples in our own new
variant of a core bufpage.h macro: PageGetItemCareful(). In other
words, we should invent the IndexTuple equivalent of the existing
PageGetItemIdCareful() function (which already does the same thing for
item pointers), and then replace all current PageGetItem() calls with
PageGetItemCareful() calls -- we ban raw PageGetItem() calls from
verify_nbtree.c forever.

Here is some of the stuff that could go in PageGetItemCareful(), just
off the top of my head:

* The existing "if (tupsize != ItemIdGetLength(itemid))" check at the
top of bt_target_page_check() -- make sure the length from the
caller's line pointer agrees with IndexTupleSize().

* Basic validation against the index's tuple descriptor -- in
particular, that varlena headers are basically sane, and that the
apparent range of datums is safely within the space on the page for
the tuple.

* Similarly, BTreeTupleGetHeapTID() should not be able to return a
pointer that doesn't actually point somewhere inside the space that
the target page has for the IndexTuple.

* No external TOAST pointers, since this is an index AM, and so
doesn't allow that.

In general this kind of very basic validation should be pushed down to
the lowest level code, so that it detects the problem as early as
possible, before slightly higher level code has the opportunity to
run. Higher level code is always going to be at risk of making
assumptions about the data not being corrupt, because there is so much
more of it, and also because it tends to roughly look like idiomatic
AM code.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-03-13 19:18:41 Re: pg_amcheck contrib application
Previous Message Mark Dilger 2021-03-13 18:51:27 Re: pg_amcheck contrib application