Re: BUG #17212: pg_amcheck fails on checking temporary relations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations
Date: 2021-10-04 23:10:05
Message-ID: CAH2-Wzkwb+1GSONMrXByHhgkzD-N59A_kFcvsoA_+M9cNNHm5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Oct 4, 2021 at 3:36 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> >> I believe this is a bug in amcheck's btree checking functions. Peter, can you take a look?
> >
> > Why do you say that?
>
> Because REINDEX CONCURRENTLY and the bt_index_parent_check() function seem to have lock upgrade hazards that are unrelated to pg_amcheck.

The problem with that argument is that the bt_index_parent_check()
function isn't doing anything particularly special, apart from
dropping the lock. That has been its behavior for many years now.

> On to pg_amcheck's behavior....
>
> I see no evidence in the OP's complaint that pg_amcheck is misbehaving. It launches a worker to check each relation, prints for the user's benefit any errors those checks raise, and finally returns 0 if they all pass and 2 otherwise. Since not all relations could be checked, 2 is returned. Returning 0 would be misleading, as it implies everything was checked and passed, and it can't honestly say that. The return value 2 does not mean that anything failed. It means that not all checks passed. When a 2 is returned, the user is expected to read the output and decide what, if anything, they want to do about it. In this case, the user might decide to wait until the reindex finishes and check again, or they might decide they don't care.
>
> It is true that pg_amcheck is calling bt_index_parent_check() on an invalid index, but so what? If it chose not to do so, it would still need to print a message about the index being unavailable for checking, and it would still have to return 2.

Why would it have to print such a message? You seem to be presenting
this as if there is some authoritative, precise, relevant definition
of "the relations that pg_amcheck sees". But to me the relevant
details look arbitrary at best.

> It can't return 0, and it is unhelpful to leave the user in the dark about the fact that not all indexes are in the right state for checking.

Why is that unhelpful? More to the point, *why* would this alternative
behavior constitute "leaving the user in the dark"?

What about the case where the pg_class entry isn't visible to our MVCC
snapshot? Why is "skipping" such a relation not just as unhelpful?

> I think this bug report is really a feature request. The OP appears to want an option to toggle on/off the printing of such information, perhaps with not printing it as the default.

And I don't understand why you think that clearly-accidental
implementation details (really just bugs) should be treated as
axiomatic truths about how pg_amcheck must work. Should we now "fix"
pg_dump so that it matches pg_amcheck?

All of the underlying errors are cases that were clearly intended to
catch user error -- every single one. But apparently pg_amcheck is
incapable of error, by definition. Like HAL 9000.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Mark Dilger 2021-10-04 23:28:53 Re: BUG #17212: pg_amcheck fails on checking temporary relations
Previous Message PG Bug reporting form 2021-10-04 23:04:38 BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-10-04 23:28:53 Re: BUG #17212: pg_amcheck fails on checking temporary relations
Previous Message Mark Dilger 2021-10-04 22:36:06 Re: BUG #17212: pg_amcheck fails on checking temporary relations