Re: HOT chain validation in verify_heapam()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-09-20 13:13:19
Message-ID: CA+TgmoYD-mx2=2KhW8815p1rU=2Ftndn2O4UEk87JbPN7qrwtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2022 at 5:00 AM Himanshu Upadhyaya
<upadhyaya(dot)himanshu(at)gmail(dot)com> wrote:
> Please find it attached.

This patch still has no test cases. Just as we have test cases for the
existing corruption checks, we should have test cases for these new
corruption checks, showing cases where they actually fire.

I think I would be inclined to set lp_valid[x] = true in both the
redirected and non-redirected case, and then have the very first thing
that the second loop does be if (nextoffnum == 0 ||
!lp_valid[ctx.offnum]) continue. I think that would be more clear
about the intent to ignore line pointers that failed validation. Also,
if you did it that way, then you could catch the case of a redirected
line pointer pointing to another redirected line pointer, which is a
corruption condition that the current code does not appear to check.

+ /*
+ * Validation via the predecessor array. 1) If the predecessor's
+ * xmin is aborted or in progress, the current tuples xmin should
+ * be aborted or in progress respectively. Also both xmin's must
+ * be equal. 2) If the predecessor's xmin is not frozen, then
+ * current tuple's shouldn't be either. 3) If the predecessor's
+ * xmin is equal to the current tuple's xmin, the current tuple's
+ * cmin should be greater than the predecessor's cmin. 4) If the
+ * current tuple is not HOT then its predecessor's tuple must not
+ * be HEAP_HOT_UPDATED. 5) If the current tuple is HOT then its
+ * predecessor's tuple must be HEAP_HOT_UPDATED.
+ */

This comment needs to be split up into pieces and the pieces need to
be moved closer to the tests to which they correspond.

+ psprintf("unfrozen tuple was
updated to produce a tuple at offset %u which is not frozen",

Shouldn't this say "which is frozen"?

+ * Not a corruption if current tuple is updated/deleted by a
+ * different transaction, means t_cid will point to cmax (that is
+ * command id of deleting transaction) and cid of predecessor not
+ * necessarily will be smaller than cid of current tuple. t_cid

I think that the next person who reads this code is likely to
understand that the CIDs of different transactions are numerically
unrelated. What's less obvious is that if the XID is the same, the
newer update must have a higher CID.

+ * can hold combo command id but we are not worrying here since
+ * combo command id of the next updated tuple (if present) must be
+ * greater than combo command id of the current tuple. So here we
+ * are not checking HEAP_COMBOCID flag and simply doing t_cid
+ * comparison.

I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message
claiming that the CID has a certain value when that's actually a combo
CID is misleading, so at least a different message wording is needed
in such cases. But it's also not clear to me that the newer update has
to have a higher combo CID, because combo CIDs can be reused. If you
have multiple cursors open in the same transaction, the updates can be
interleaved, and it seems to me that it might be possible for an older
CID to have created a certain combo CID after a newer CID, and then
both cursors could update the same page in succession and end up with
combo CIDs out of numerical order. Unless somebody can make a
convincing argument that this isn't possible, I think we should just
skip this check for cases where either tuple has a combo CID.

+ if (TransactionIdEquals(pred_xmin, curr_xmin) &&
+ (TransactionIdEquals(curr_xmin, curr_xmax) ||
+ !TransactionIdIsValid(curr_xmax)) && pred_cmin >= curr_cmin)

I don't understand the reason for the middle part of this condition --
TransactionIdEquals(curr_xmin, curr_xmax) ||
!TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
explain this, but I still don't get it. If a tuple with XMIN 12345
CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
corruption, regardless of what the XMAX of the second tuple may happen
to be.

+ if (HeapTupleHeaderIsHeapOnly(curr_htup) &&
+ !HeapTupleHeaderIsHotUpdated(pred_htup))

+ if (!HeapTupleHeaderIsHeapOnly(curr_htup) &&
+ HeapTupleHeaderIsHotUpdated(pred_htup))

I think it would be slightly clearer to write these tests the other
way around i.e. check the previous tuple's state first.

+ if (!TransactionIdIsValid(curr_xmax) &&
HeapTupleHeaderIsHotUpdated(tuphdr))
+ {
+ report_corruption(ctx,
+ psprintf("tuple has been updated, but xmax is 0"));
+ result = false;
+ }

I guess this message needs to say "tuple has been HOT updated, but
xmax is 0" or something like that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-09-20 13:42:25 Re: why can't a table be part of the same publication as its schema
Previous Message Zhang Mingli 2022-09-20 12:52:13 Re: Summary function for pg_buffercache