From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: new heapcheck contrib module |
Date: | 2020-07-04 13:04:21 |
Message-ID: | CAFiTN-uD3WSeggU20yU6voGou6BDb-cQcfkE8B1rAzWKZVK-Bw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 28, 2020 at 11:18 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jun 28, 2020, at 9:05 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > Some more comments on v9_0001.
> > 1.
> > + LWLockAcquire(XidGenLock, LW_SHARED);
> > + nextFullXid = ShmemVariableCache->nextFullXid;
> > + ctx.oldestValidXid = ShmemVariableCache->oldestXid;
> > + LWLockRelease(XidGenLock);
> > + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
> > ...
> > ...
> > +
> > + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
> > + {
> > + int32 mapbits;
> > + OffsetNumber maxoff;
> > + PageHeader ph;
> > +
> > + /* Optionally skip over all-frozen or all-visible blocks */
> > + if (skip_all_frozen || skip_all_visible)
> > + {
> > + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> > + &vmbuffer);
> > + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> > + continue;
> > + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > + continue;
> > + }
> > +
> > + /* Read and lock the next page. */
> > + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
> > + RBM_NORMAL, ctx.bstrategy);
> > + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
> >
> > I might be missing something, but it appears that first we are getting
> > the nextFullXid and after that, we are scanning the block by block.
> > So while we are scanning the block if the nextXid is advanced and it
> > has updated some tuple in the heap pages, then it seems the current
> > logic will complain about out of range xid. I did not test this
> > behavior so please point me to the logic which is protecting this.
>
> We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so. We cannot know that the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its validity, and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check. That logic is in TransactionIdValidInRel.
That makes sense to me.
>
> > 2.
> > /*
> > * Helper function to construct the TupleDesc needed by verify_heapam.
> > */
> > static TupleDesc
> > verify_heapam_tupdesc(void)
> >
> > From function name, it appeared that it is verifying tuple descriptor
> > but this is just creating the tuple descriptor.
>
> In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records. This is the tuple descriptor for that function. I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as (verify_heapam)_tupdesc. Do you have a name you would prefer?
Not very particular, but if we have a name like
verify_heapam_get_tupdesc, But, just a suggestion so it's your choice
if you prefer the current name I have no objection.
>
> > 3.
> > + /* Optionally skip over all-frozen or all-visible blocks */
> > + if (skip_all_frozen || skip_all_visible)
> > + {
> > + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> > + &vmbuffer);
> > + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> > + continue;
> > + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > + continue;
> > + }
> >
> > Here, do we want to test that in VM the all visible bit is set whereas
> > on the page it is not set? That can lead to a wrong result in an
> > index-only scan.
>
> If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot load the page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having specified these options. Without loading the page, we cannot check the page's header bits.
>
> When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page in order to compare the two, being careful not to hold a pin on the one while performing I/O on the other. See for example the logic in heap_delete(). But I'm not sure what guarantees the system makes about agreement between these two bits. Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is all-visible will always have its all-visible bit set? I don't know if (possibly transient) disagreement between these two bits constitutes corruption. Perhaps others following this thread can advise?
Right, the VM should not claim its all visible when it actually not.
But, IIRC, it is not guaranteed that if the page is all visible then
the VM must set the all visible flag.
> > 4. One cosmetic comment
> >
> > + /* Skip non-varlena values, but update offset first */
> > ..
> > +
> > + /* Ok, we're looking at a varlena attribute. */
> >
> > Throughout the patch, I have noticed that some of your single-line
> > comments have "full stop" whereas other don't. Can we keep them
> > consistent?
>
> I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments. To me, a "full stop" means that a sentence has reached its conclusion. I don't intentionally use one at the end of a fragment, unless the fragment precedes a full sentence, in which case the "full stop" is needed to separate the two. Of course, I may have violated my own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can agree on what the rule is? (This has probably been discussed before and agreed before. A link to the appropriate email thread would be sufficient.)
I can see in different files we have followed different rules. I am
fine as far as those are consistent across the file.
> For example:
>
> /* red, green, or blue */
> /* set to pink */
> /* set to blue. We have not closed the file. */
> /* At this point, we have chosen the color. */
>
> The first comment is not a sentence, but the fourth is. The third comment is a fragment followed by a full sentence, and a "full stop" separates the two. As for the second comment, as I recall, verb phrases can be interpreted as a full sentence, as in "Close the door!", when they are meant as commands to the listener, but not otherwise. "set to pink" is not a command to the reader, but rather a description of what the code is doing at that point, so I think of it as a mere verb phrase and not a full sentence.
> Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that would ereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in unmodified form. That means that my normal commenting rules don't apply, as I'm not the comment author in such cases.
I agree.
A few more comments.
1.
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
....
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
So first we are checking that if the varatt is not
VARATT_IS_EXTERNAL_ONDISK then we are returning, but just a
few statements down we are checking if the varatt is
VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.
2. Another point related to the same code is that toast_save_datum
always set the VARTAG_ONDISK tag. IIUC, we use
VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
while scanning the heap we should never get
VARATT_IS_EXTERNAL_INDIRECT tuple. Am I missing something here?
3.
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
First, we are checking that if VARATT_IS_1B_E and if so we will check
whether its tag is VARTAG_ONDISK or not. But just after that, we will
get the actual attribute pointer and
Again check the same thing with 2 different checks. Can you explain
why this is necessary?
4.
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
Maybe we can further expand these checks, like if the tuple is
HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
be set.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2020-07-04 13:13:01 | Re: Resetting spilled txn statistics in pg_stat_replication |
Previous Message | Peter Eisentraut | 2020-07-04 11:36:44 | Re: warnings for invalid function casts |