Re: Corrupted btree index on HEAD because of covering indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject: Re: Corrupted btree index on HEAD because of covering indexes
Date: 2018-04-24 01:13:47
Message-ID: CAH2-WzmRA7eg2QK73d3Oekp9wYucf9L+1VZ5cvJB+2cy4DoLtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I refined the amcheck enhancement quite a bit today. It will not just
> check that a downlink is not missing; It will also confirm that it
> wasn't a legitimately interrupted multi-level deletion, by descending
> to the leaf page to match the leaf high key pointer to the top most
> parent, which should be the target page (the page that lacked a
> downlink according to the new Bloom filter). We need to worry about
> multi-level deletions that are interrupted by an error or a hard
> crash, which presents a legitimate case where there'll be no downlink
> for an internal page in its parent. VACUUM is okay with that, so we
> must be too.

Attached patch lets amcheck detect the issue when
bt_index_parent_check() is called, though only when heapallindexed
verification was requested (that is, only when bt_index_parent_check()
is doing something with a Bloom filter already). The new checks will
probably also detect any possible issue with multi-level page
deletions. The patch tightens up our general expectations around
half-dead and fully deleted pages, which seems necessary but also
independently useful.

I'm using work_mem to constrain the size of the second Bloom filter,
whereas the heapallindexed Bloom filter is constrained by
maintenance_work_mem. This seems fine to me, since we have always used
an additional work_mem budget for spool2 when building a unique index
within nbtsort.c. Besides, it will probably be very common for the
downlink Bloom filter to be less than 1% the size of the first Bloom
filter when we have adequate memory for both Bloom filters (i.e. very
small). I thought about mentioning this work_mem allocation in the
docs, but decided that there was no need, since the CREATE INDEX
spool2 work_mem stuff isn't documented anywhere either.

Note that the "c.relkind = 'i'" change in the docs is about not
breaking the amcheck query when local partitioned indexes happen to be
in use (when the user changed the sample SQL query to not just look at
pg_catalog indexes). See the "Local partitioned indexes and
pageinspect" thread I just started for full details.

The new P_ISDELETED() test within bt_downlink_missing_check() is what
actually detects the corruption that the test case causes, since the
fully deleted leaf page still has a sane top parent block number left
behind (that is, we don't get as far as testing "if
(BTreeTupleGetTopParent(itup) == state->targetblock)"; that's not how
the the leaf page can get corrupt in the test case that Michael
posted). Note that there are also two new similar P_ISDELETED() tests
added to two existing functions (bt_downlink_check() and
bt_check_level_from_leftmost()), but those tests won't detect the
corruption that we saw. They're really there to nail down how we think
about fully deleted pages.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Add-missing-and-dangling-downlink-checks-to-amcheck.patch text/x-patch 25.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-04-24 01:40:16 Re: Problem while setting the fpw with SIGHUP
Previous Message Amit Langote 2018-04-24 01:12:42 Re: partitioning code reorganization