From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: new heapcheck contrib module |
Date: | 2020-08-05 14:08:50 |
Message-ID: | CA+TgmoaOJY7X6VrE+aqoeJ4cFXbh9qDJSYh6J3BNB8u4zo5Opw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 4, 2020 at 9:06 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> of messed-up with indexes in my experience. The first error really
> does tell you most of what you need to know about any given corrupt
> index. Kind of like how you can bucket the number of cockroaches in
> your home into perhaps three meaningful buckets: 0 cockroaches, at
> least 1 cockroach, and lots of cockroaches. (Even there, if you really
> care about the distinction between the second and third bucket,
> something has gone terribly wrong -- so even three buckets seems like
> a lot to me.)
Not sure I agree with this. As a homeowner, the distinction between 0
and 1 is less significant to me than the distinction between a few
(preferably in places where I'll never see them) and whole lot. I
agree with you to an extent though: all I really care about is whether
I have too few to worry about, enough that I'd better try to take care
of it somehow, or so many that I need a professional exterminator. If,
however, I were a professional exterminator, I would be unhappy with
just knowing that there are few problems or many. I suspect I would
want to know something about where the problems were, and get a more
nuanced indication of just how bad things are in each location.
FWIW, pg_catcheck is an example of an existing tool (designed by me
and written partially by me) that uses the kind of model I'm talking
about. It does a single SELECT * FROM pg_<whatever> on each catalog
table - so that it doesn't get confused if your system catalog indexes
are messed up - and then performs a bunch of cross-checks on the
tuples it gets back and tells you about all the messed up stuff. If it
can't get data from all your catalog tables it performs whichever
checks are valid given what data it was able to get. As a professional
exterminator of catalog corruption, I find it quite helpful. If
someone sends me the output from a database cluster, I can tell right
away whether they are just fine, in a little bit of trouble, or in a
whole lot of trouble; I can speculate pretty well about what kind of
thing might've happened to cause the problem; and I can recommend
steps to straighten things out.
> FWIW, current DEBUG1 + DEBUG2 output for amcheck shows you quite a lot
> of details about the tree structure. It's a handy way of getting a
> sense of what's going on at a high level. For example, if index
> corruption is found very early on, that strongly suggests that it's
> pretty pervasive.
Interesting.
> > A fourth type of checking is to verify the index key against the keys
> > in the heap tuples to which they point, but only for index tuples that
> > passed the basic index-heap sanity checking and where the tuples have
> > not been pruned. This can be sensibly done even if the structural
> > checks or index-ordering checks have failed.
>
> That's going to require the equivalent of a merge join, which is
> terribly expensive relative to such a small benefit.
I think it depends on how big your data is. If you've got a 2TB table
and 512GB of RAM, it's pretty impractical no matter the algorithm. But
for small tables even a naive nested loop will suffice.
> Meanwhile, a simple smoke test covering many indexes probably gives
> you a fairly meaningful idea of the extent of the damage, without
> requiring that we do any hard engineering work.
In my experience, when EDB customers complain about corruption-related
problems, the two most common patterns are: (1) my whole system is
messed up and (2) I have one or a few specific objects which are
messed up and everything else is fine. The first category is often
something like inability to start the database, or scary messages in
the log file complaining about, say, checkpoints failing. The second
category is the one I'm worried about here. The people who are in this
category generally already know which things are broken; they've
figured that out through trial and error. Sometimes they miss some
problems, but more frequently, in my experience, their understanding
of what problems they have is accurate. Now that category of users can
be further decomposed into two groups: the people who don't care what
happened and just want to barrel through it, and the people who do
care what happened and want to know what happened, why it happened,
whether it's a bug, etc. The first group are unproblematic: tell them
to REINDEX (or restore from backup, or whatever) and you're done.
The second group is a lot harder. It is in general difficult to
speculate about how something that is now wrong got that way given
knowledge only of the present state of affairs. But good tooling makes
it easier to speculate intelligently. To take a classic example,
there's a great difference between a checksum failure caused by the
checksum being incorrect on an otherwise-valid page; a checksum
failure on a page the first half of which appears valid and the second
half of which looks like it might be some other database page; and a
checksum failure on a page whose contents appear to be taken from a
Microsoft Word document. I'm not saying we ever want a tool which
tries to figure that sort of thing out in an automated way; there's no
substitute for human intelligence (yet, anyway). But, the more the
tools we do have localize the problems to particular pages or tuples
and describe them accurately, the easier it is to do manual
investigation as follow-up, when it's necessary.
> That having been said, I suspect that this is a huge task for a small
> benefit. It's exceptionally hard to test because you have lots of
> non-trivial code that only gets used in circumstances that by
> definition should never happen. If users really needed to recover the
> data in the index then maybe it would happen -- but they don't.
Yep, that's a very key difference as compared to the heap.
> The biggest problem that amcheck currently has is that it isn't used
> enough, because it isn't positioned as a general purpose tool at all.
> I'm hoping that the work from Mark helps with that.
Agreed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2020-08-05 15:20:50 | Re: Yet another issue with step generation in partition pruning |
Previous Message | Dilip Kumar | 2020-08-05 14:07:37 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |