From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Subject: | Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool)) |
Date: | 2017-10-16 19:57:39 |
Message-ID: | CAH2-Wz=4C2_m=EKZxuJRwh_hTVgLzaaussNNxeh_Oi_QxS9Spw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 13, 2017 at 7:09 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> All good questions; I don't know offhand. Discovering those answers is
> perhaps the chief labor required of such a project.
ISTM that by far the hardest part of the project is arriving at a
consensus around what a good set of invariants for CLOG and MultiXact
looks like.
I think that it's fair to say that this business with relfrozenxid now
appears to be more complicated than many of us would have thought. Or,
at least, more complicated than I thought when I first started
thinking about it. Once we're measuring this complexity (by having
checks), we should be in a better position to keep it under control,
and to avoid ambiguity.
> The checker should
> consider circumstances potentially carried from past versions via pg_upgrade.
Right. False positives are simply unacceptable.
> Fortunately, if you get some details wrong, it's cheap to recover from checker
> bugs.
Ideally, amcheck will become a formal statement of the contracts
provided by major subsystems, such as the heapam, the various SLRUs,
and so on. While I agree that having bugs there is much less severe
than having bugs in backend code, I would like the tool to reach a
point where it actually *defines* correctness (by community
consensus). If a bug in amcheck reflects a bug in our high level
thinking about correctness, then that actually is a serious problem.
Arguably, it's the most costly variety of bug that Postgres can have.
I may never be able to get general buy-in here; building broad
consensus like that is a lot harder than writing some code for a
contrib module. Making the checking code the *authoritative* record of
how invariants are *expected* to work is a major goal of the project,
though.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-10-16 20:01:48 | Re: Still another race condition in recovery TAP tests |
Previous Message | Joshua D. Drake | 2017-10-16 17:03:52 | Re: Determine state of cluster (HA) |