Re: cannot freeze committed xmax

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cannot freeze committed xmax
Date: 2024-11-20 22:25:37
Message-ID: 617882E6-22E2-4A39-9BF1-02375F4C20A0@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 20, 2024, at 6:39 AM, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
>> On 20 Nov 2024, at 15:58, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>> PFA the patch doing so.
>
> Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked uninitiated.
> But, well, it highlights the idea: make verify_heapam() aware of such corruptions.
> What do you think?

I like the idea of increasing the corruption checking coverage. The worry with these patches is that we'll overlook some legitimate use case of the status bits and call it corruption when it isn't. Indeed, that appears to be the case with this patch, assuming I initialized the xmax_status field in the way you had in mind, and that applying it to REL_17_STABLE is ok. (Maybe this would work differently on HEAD?)

+ get_xid_status(xmax, ctx, &xmax_status);
+ if (xmax_status == XID_COMMITTED && (tuphdr->t_infomask & HEAP_UPDATED))
+ {
+ report_corruption(ctx,
+ psprintf("committed xmax %u while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED flags",
+ xmax));
+ }

That results in TAP test failures on a uncorrupted but frozen table:

# +++ tap check in contrib/amcheck +++
t/001_verify_heapam.pl ......... 74/?
# Failed test 'all-frozen not corrupted table'
# at t/001_verify_heapam.pl line 53.
# got: '30|117||committed xmax 2 while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED flags'
# expected: ''
t/001_verify_heapam.pl ......... 257/? # Looks like you failed 1 test of 272.
t/001_verify_heapam.pl ......... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/272 subtests

The first part of your patch which checks the xmin_status seems ok at first glance.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-11-20 22:44:36 Re: Return pg_control from pg_backup_stop().
Previous Message Tomas Vondra 2024-11-20 22:19:52 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly