Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-10-23 01:47:50
Message-ID: DA994DD7-5E36-4CA4-8FB6-5870B9D8D696@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 22, 2020, at 6:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
>> So now I think this is a REDIRECT on either architecture, but the
>> offset and length fields have different values, causing the redirect
>> pointer to point to different places. Maybe it happens to point
>> at a DEAD tuple in the big-endian case.
>
> Just to make sure, I tried this test program:
>
> #include <stdio.h>
> #include <string.h>
>
> typedef struct ItemIdData
> {
> unsigned lp_off:15, /* offset to tuple (from start of page) */
> lp_flags:2, /* state of line pointer, see below */
> lp_len:15; /* byte length of tuple */
> } ItemIdData;
>
> int main()
> {
> ItemIdData lp;
>
> memset(&lp, 0x77, sizeof(lp));
> printf("off = %x, flags = %x, len = %x\n",
> lp.lp_off, lp.lp_flags, lp.lp_len);
> return 0;
> }
>
> I get
>
> off = 7777, flags = 2, len = 3bbb
>
> on a little-endian machine, and
>
> off = 3bbb, flags = 2, len = 7777
>
> on big-endian. It'd be less symmetric if the bytes weren't
> all the same ...

I think we're going in the wrong direction here. The idea behind this test was to have as little knowledge about the layout of pages as possible and still verify that damaging the pages would result in corruption reports. Of course, not all damage will result in corruption reports, because some damage looks legit. I think it was just luck (good or bad depending on your perspective) that the damage in the test as committed works on little-endian but not big-endian.

I can embed this knowledge that you have researched into the test if you want me to, but my instinct is to go the other direction and have even less knowledge about pages in the test. That would work if instead of expecting corruption for every time the test writes the file, instead to have it just make sure that it gets corruption reports at least some of the times that it does so. That seems more maintainable long term.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-10-23 01:50:29 Re: new heapcheck contrib module
Previous Message Tom Lane 2020-10-23 01:46:19 Re: new heapcheck contrib module