Re: Online verification of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-02-05 10:30:48
Message-ID: 68ffd2de-be8b-b49f-a077-7ab529559d6f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/5/19 8:01 AM, Andres Freund wrote:
> Hi,
>
> On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
>>>>> I'm wondering (possibly again) about the existing early exit if one block
>>>>> cannot be read on retry: the command should count this as a kind of bad
>>>>> block, proceed on checking other files, and obviously fail in the end, but
>>>>> having checked everything else and generated a report. I do not think that
>>>>> this condition warrants a full stop. ISTM that under rare race conditions
>>>>> (eg, an unlucky concurrent "drop database" or "drop table") this could
>>>>> happen when online, although I could not trigger one despite heavy testing,
>>>>> so I'm possibly mistaken.
>>>>
>>>> This seems like a defensible judgement call either way.
>>>
>>> Right now we have a few tests that explicitly check that
>>> pg_verify_checksums fail on broken data ("foo" in the file). Those
>>> would then just get skipped AFAICT, which I think is the worse behaviour
>>> , but if everybody thinks that should be the way to go, we can
>>> drop/adjust those tests and make pg_verify_checksums skip them.
>>>
>>> Thoughts?
>>
>> My point is that it should fail as it does, only not immediately (early
>> exit), but after having checked everything else. This mean avoiding calling
>> "exit(1)" here and there (lseek, fopen...), but taking note that something
>> bad happened, and call exit only in the end.
>
> I can see both as being valuable (one gives you a more complete picture,
> the other a quicker answer in scripts). For me that's the point where
> it's the prerogative of the author to make that choice.
>

Why not make this configurable, using a command-line option?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-02-05 10:41:55 Re: What happens if checkpoint haven't completed until the next checkpoint interval or max_wal_size?
Previous Message Daniel Gustafsson 2019-02-05 10:09:42 Re: Tighten up a few overly lax regexes in pg_dump's tap tests