Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))

From: Noah Misch <noah(at)leadboat(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))
Date: 2017-10-17 03:09:43
Message-ID: 20171017030943.GB539552@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 16, 2017 at 12:57:39PM -0700, Peter Geoghegan wrote:
> On Fri, Oct 13, 2017 at 7:09 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > The checker should
> > consider circumstances potentially carried from past versions via pg_upgrade.
>
> Right. False positives are simply unacceptable.

False positives are bugs, but they're not exceptionally-odious bugs.

> > Fortunately, if you get some details wrong, it's cheap to recover from checker
> > bugs.
>
> Ideally, amcheck will become a formal statement of the contracts
> provided by major subsystems, such as the heapam, the various SLRUs,
> and so on. While I agree that having bugs there is much less severe
> than having bugs in backend code, I would like the tool to reach a
> point where it actually *defines* correctness (by community
> consensus).

That presupposes construction of two pieces of software, the server and the
checker, such that every disagreement is a bug in the server. But checkers
get bugs just like servers get bugs. Checkers do provide a sort of
double-entry bookkeeping. When a reproducible test case prompts a checker
complaint, we'll know *some* code is wrong. That's an admirable contribution.

> If a bug in amcheck reflects a bug in our high level
> thinking about correctness, then that actually is a serious problem.

My notion of data file correctness is roughly this:

A data file is correct if the server's reads and mutations thereof will not
cause it to deviate from documented behavior. Where the documentation isn't
specific, fall back on SQL standards. Where no documentation or SQL
standard addresses a particular behavior, we should debate the matter and
document the decision.

I'm essentially saying that the server is innocent until proven guilty. It
would be cool to have a self-contained specification of PostgreSQL data files,
but where the server disagrees with the spec without causing problem
behaviors, we'd ultimately update the spec to fit the server.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-10-17 03:34:17 Re: [PATCH] Add recovery_min_apply_delay_reconnect recovery option
Previous Message Tatsuo Ishii 2017-10-17 02:59:05 Re: [PATCH] Lockable views