Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-08-29 17:48:44
Message-ID: 38AF687F-8F6B-48B4-AB9E-A60CFD6CC261@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 29, 2020, at 3:27 AM, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
>> 29 авг. 2020 г., в 00:56, Robert Haas <robertmhaas(at)gmail(dot)com> написал(а):
>>
>> On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
>>> Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
>>
>> The code can (and should, and I think does) refrain from looking up
>> XIDs that are out of the range thought to be valid -- but how do you
>> propose that it avoid looking up XIDs that ought to have clog data
>> associated with them despite being >= relfrozenxid and < nextxid?
>> TransactionIdDidCommit() does not have a suppress-errors flag, adding
>> one would be quite invasive, yet we cannot safely perform a
>> significant number of checks without knowing whether the inserting
>> transaction committed.
>
> What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary.
>
> But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could.
>
> This corruption like a caries had 3 stages:
> 1. incorrect VM flag that page do not need vacuum
> 2. xmin and xmax < relfrozenxid
> 3. CLOG truncated
>
> Stage 2 is curable with proposed toolset, stage 3 is not. But they are not that different.

I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog. Ultimately, I ripped that out. My reasoning was that a simpler patch submission was more likely to be acceptable to the community.

If you want to submit a separate patch that creates a non-throwing version of the clog interface, and get the community to accept and commit it, I would seriously consider using that from verify_heapam. If it gets committed in time, I might even do so for this release cycle. But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-08-29 21:47:34 Re: list of extended statistics on psql
Previous Message Andrey Lepikhov 2020-08-29 17:00:18 Re: Ideas about a better API for postgres_fdw remote estimates