Re: new heapcheck contrib module

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-08-06 16:43:17
Message-ID: CA+TgmoaFMHg5tNCdZ-xXYPYPoVCKua8WSBw7dmZ6VxqDiK8VFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 5, 2020 at 4:36 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Right, but the professional exterminator can be expected to use expert
> level tools, where a great deal of technical sophistication is
> required to interpret what's going on sensibly. An amatuer can only
> use them to determine if something is wrong at all, which is usually
> not how they add value.

Quite true.

> I myself seem to have had quite different experiences with corruption,
> presumably because it happened at product companies like Heroku. I
> tended to find software bugs (e.g. the one fixed by commit 008c4135)
> that were rare and novel by casting a wide net over a large number of
> relatively homogenous databases. Whereas your experiences tend to
> involve large support customers with more opportunity for operator
> error. Both perspectives are important.

I concur.

> I wrote my own expert level tool, pg_hexedit. I have to admit that the
> level of interest in that tool doesn't seem to be all that great,
> though I myself have used it to investigate corruption to great
> effect. But I suppose there is no way to know how it's being used.

I admit not to having tried pg_hexedit, but I doubt that it would help
me very much outside of my own development work. The problem is that
in a typical case I am trying to help someone in a professional
capacity without access to their machines, and without knowledge of
their environment or data. Moreover, sometimes the person I'm trying
to help is an unreliable narrator. I can ask people to run tools they
have and send the output, and then I can look at that output and tell
them what to do next. But it has to be a tool they have (or they can
easily get) and it can't involve any complicated if-then stuff.
Something like "if the page is totally garbled then do X but if it
looks mostly OK then do Y" is radically out of reach. They have no
clue about that. Hence my interest in tools that automate as much of
the investigation as may be practical.

We're probably beating this topic to death at this point; I don't
think we are really in any sort of meaningful disagreement, and the
next steps in this particular case seem clear enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-08-06 17:35:54 Re: pendingOps table is not cleared with fsync=off
Previous Message Peter Geoghegan 2020-08-06 16:38:56 Re: Amcheck: do rightlink verification with lock coupling