Re: new heapcheck contrib module

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-11-19 22:51:42
Message-ID: CAH2-Wz=ydSfTrBxLtDrk97anhaNzH3iHaUZPjujLF5+9yd=W5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2020 at 1:50 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> It makes sense to me to have a "don't run through minefields" option, and a "go ahead, run through minefields" option for pg_amcheck, given that users in differing situations will have differing business consequences to bringing down the server in question.

This kind of framing suggests zero-risk bias to me:

https://en.wikipedia.org/wiki/Zero-risk_bias

It's simply not helpful to think of the risks as "running through a
minefield" versus "not running through a minefield". I also dislike
this framing because in reality nobody runs through a minefield,
unless maybe it's a battlefield and the alternative is probably even
worse. Risks are not discrete -- they're continuous. And they're
situational.

I accept that there are certain reasonable gradations in the degree to
which a segfault is bad, even in contexts in which pg_amcheck runs
into actual serious problems. And as Robert points out, experience
suggests that on average people care about availability the most when
push comes to shove (though I hasten to add that that's not the same
thing as considering a once-off segfault to be the greater evil here).
Even still, I firmly believe that it's a mistake to assign *infinite*
weight to not having a segfault. That is likely to have certain
unintended consequences that could be even worse than a segfault, such
as not detecting pernicious corruption over many months because our
can't-segfault version of core functionality fails to have the same
bugs as the actual core functionality (and thus fails to detect a
problem in the core functionality).

The problem with giving infinite weight to any one bad outcome is that
it makes it impossible to draw reasonable distinctions between it and
some other extreme bad outcome. For example, I would really not like
to get infected with Covid-19. But I also think that it would be much
worse to get infected with Ebola. It follows that Covid-19 must not be
infinitely bad, because if it is then I can't make this useful
distinction -- which might actually matter. If somebody hears me say
this, and takes it as evidence of my lackadaisical attitude towards
Covid-19, I can live with that. I care about avoiding criticism as
much as the next person, but I refuse to prioritize it over all other
things.

> I doubt other backend hardening is any more likely to get committed.

I suspect you're right about that. Because of the risks of causing
real harm to users.

The backend code is obviously *not* written with the assumption that
data cannot be corrupt. There are lots of specific ways in which it is
hardened (e.g., there are many defensive "can't happen" elog()
statements). I really don't know why you insist on this black and
white framing.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-20 00:22:05 Re: Disable WAL logging to speed up data loading
Previous Message Tom Lane 2020-11-19 22:42:58 Re: Should we document IS [NOT] OF?