Re: new heapcheck contrib module

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-11-19 19:47:53
Message-ID: CAH2-Wz=dy--FG5iJ0kPcQumS0W5g+xQED3t-7HE+UqAK_hmLTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2020 at 9:06 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm also not sure if these descriptions are clear enough, but it may
> also be hard to do a good job in a brief space. Still, comparing this
> to the documentation of heapallindexed makes me rather nervous. This
> is only trying to verify that the index contains all the tuples in the
> heap, not that the values in the heap and index tuples actually match.

That's a good point. As things stand, heapallindexed verification does
not notice when there are extra index tuples in the index that are in
some way inconsistent with the heap. Hopefully this isn't too much of
a problem in practice because the presence of extra spurious tuples
gets detected by the index structure verification process. But in
general that might not happen.

Ideally heapallindex verification would verify 1:1 correspondence. It
doesn't do that right now, but it could.

This could work by having two bloom filters -- one for the heap,
another for the index. The implementation would look for the absence
of index tuples that should be in the index initially, just like
today. But at the end it would modify the index bloom filter by &= it
with the complement of the heap bloom filter. If any bits are left set
in the index bloom filter, we go back through the index once more and
locate index tuples that have at least some matching bits in the index
bloom filter (we cannot expect all of the bits from each of the hash
functions used by the bloom filter to still be matches).

From here we can do some kind of lookup for maybe-not-matching index
tuples that we locate. Make sure that they point to an LP_DEAD line
item in the heap or something. Make sure that they have the same
values as the heap tuple if they're still retrievable (i.e. if we
haven't pruned the heap tuple away already).

> This to me seems too conservative. The result is that by default we
> check only tables, not indexes. I don't think that's going to be what
> users want. I don't know whether they want the heapallindexed or
> rootdescend behaviors for index checks, but I think they want their
> indexes checked. Happy to hear opinions from actual users on what they
> want; this is just me guessing that you've guessed wrong. :-)

My thoughts on these two options:

* I don't think that users will ever want rootdescend verification.

That option exists now because I wanted to have something that relied
on the uniqueness property of B-Tree indexes following the Postgres 12
work. I didn't add retail index tuple deletion, so it seemed like a
good idea to have something that makes the same assumptions that it
would have to make. To validate the design.

Another factor is that Alexander Korotkov made the basic
bt_index_parent_check() tests a lot better for Postgres 13. This
undermined the practical argument for using rootdescend verification.

Finally, note that bt_index_parent_check() was always supposed to be
something that was to be used only when you already knew that you had
big problems, and wanted absolutely thorough verification without
regard for the costs. This isn't the common case at all. It would be
reasonable to not expose anything from bt_index_parent_check() at all,
or to give it much less prominence. Not really sure of what the right
balance is here myself, so I'm not insisting on anything. Just telling
you what I know about it.

* heapallindexed is kind of expensive, but valuable. But the extra
check is probably less likely to help on the second or subsequent
index on a table.

It might be worth considering an option that only uses it with only
one index: Preferably the primary key index, failing that some unique
index, and failing that some other index.

> This seems pretty lame to me. Even if the btree checker can't tolerate
> corruption to the extent that the heap checker does, seg faulting
> because of a missing file seems like a bug that we should just fix
> (and probably back-patch). I'm not very convinced by the decision to
> override the user's decision about heapallindexed either.

I strongly agree.

> Maybe I lack
> imagination, but that seems pretty arbitrary. Suppose there's a giant
> index which is missing entries for 5 million heap tuples and also
> there's 1 entry in the table which has an xmin that is less than the
> pg_clas.relfrozenxid value by 1. You are proposing that because I have
> the latter problem I don't want you to check for the former one. But
> I, John Q. Smartuser, do not want you to second-guess what I told you
> on the command line that I wanted. :-)

Even if your user is just average, they still have one major advantage
over the architects of pg_amcheck: actual knowledge of the problem in
front of them.

> I think in general you're worrying too much about the possibility of
> this tool causing backend crashes. I think it's good that you wrote
> the heapcheck code in a way that's hardened against that, and I think
> we should try to harden other things as time permits. But I don't
> think that the remote possibility of a crash due to the lack of such
> hardening should dictate the design behavior of this tool. If the
> crash possibilities are not remote, then I think the solution is to
> fix them, rather than cutting out important checks.

I couldn't agree more.

I think that you need to have a kind of epistemic modesty with this
stuff. Okay, we guarantee that the backend won't crash when certain
amcheck functions are run, based on these caveats. But don't we always
guarantee something like that? And are the specific caveats actually
that different in each case, when you get right down to it? A
guarantee does not exist in a vacuum. It always has implicit
limitations. For example, any guarantee implicitly comes with the
caveat "unless I, the guarantor, am wrong". Normally this doesn't
really matter because normally we're not concerned about extreme
events that will probably never happen even once. But amcheck is very
much not like that. The chances of the guarantor being the weakest
link are actually rather high. Everyone is better off with a design
that accepts this view of things.

I'm also suspicious of guarantees like this for less philosophical
reasons. It seems to me like it solves our problem rather than the
user's problem. Having data that is so badly corrupt that it's
difficult to avoid segfaults when we perform some kind of standard
transformations on it is an appalling state of affairs for the user.
The segfault itself is very much not the point at all. We should focus
on making the tool as thorough and low overhead as possible. If we
have to make the tool significantly more complicated to avoid
extremely unlikely segfaults then we're actually doing the user a
disservice, because we're increasing the chances that we the
guarantors will be the weakest link (which was already high enough).
This smacks of hubris.

I also agree that hardening is a worthwhile exercise here, of course.
We should be holding amcheck to a higher standard when it comes to not
segfaulting with corrupt data.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-11-19 19:51:18 Re: proposal: possibility to read dumped table's name from file
Previous Message Andrew Dunstan 2020-11-19 19:31:14 parsing pg_ident.conf